feat: Multi-token prediction (MTP) speculative decoding #48
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Overview
Qwen3.5-4B has native MTP support (mtp_num_hidden_layers: 1 in original config). Implement speculative decoding using the MTP head to predict draft tokens, then verify against the main model.
BLOCKED until coherent 37+ tok/s baseline is reached.
Current State
Implementation Plan
Phase 1: Weight loading
Phase 2: GPU MTP forward pass
Phase 3: Speculative decoding engine
Phase 4: GPU pipelining
Expected Speedup
With 1 MTP head producing 1 draft token:
Acceptance Criteria
Dependencies