[feature] Integrated single-pass MTP speculative decoding #50
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Branch: feature/3-integrated-mtp (3 commits ahead of main)
Integrated MTP that runs in a single pass through the scheduler, with benchmark infrastructure for measuring MTP speedup on real models.
Commits:
c92124ffeat(cache): add PagedQ4KVCache with incremental q4_0 quantization201eb18feat(mtp): integrated single-pass MTP speculative decoding02276e3benchmarks: real model baseline + MTP speedAcceptance criteria:
Already implemented on main. The
mtp_forwardis integrated directly into the scheduler (_mtp_stepat line 1955) rather than as a separatemtp_stepper.pymodule. Main has full MTP support: idle/has_draft phases, draft verification, bonus token, output parser integration (PR #63), mixed mode (PR #64). The benchmarks in this branch are nice-to-have but not required for the feature.