[feature] Integrated single-pass MTP speculative decoding #50

Closed
opened 2026-05-14 23:29:17 +02:00 by sleepy · 1 comment
Owner

Branch: feature/3-integrated-mtp (3 commits ahead of main)

Integrated MTP that runs in a single pass through the scheduler, with benchmark infrastructure for measuring MTP speedup on real models.

Commits:

  • c92124f feat(cache): add PagedQ4KVCache with incremental q4_0 quantization
  • 201eb18 feat(mtp): integrated single-pass MTP speculative decoding
  • 02276e3 benchmarks: real model baseline + MTP speed

Acceptance criteria:

  • Single-pass MTP integrated into scheduler (no separate engine-level path)
  • Benchmark infrastructure for MTP speedup measurement
  • No regression in non-MTP generation quality or speed
  • Tests for the integrated MTP path
Branch: feature/3-integrated-mtp (3 commits ahead of main) Integrated MTP that runs in a single pass through the scheduler, with benchmark infrastructure for measuring MTP speedup on real models. Commits: - c92124f feat(cache): add PagedQ4KVCache with incremental q4_0 quantization - 201eb18 feat(mtp): integrated single-pass MTP speculative decoding - 02276e3 benchmarks: real model baseline + MTP speed Acceptance criteria: - Single-pass MTP integrated into scheduler (no separate engine-level path) - Benchmark infrastructure for MTP speedup measurement - No regression in non-MTP generation quality or speed - Tests for the integrated MTP path
Author
Owner

Already implemented on main. The mtp_forward is integrated directly into the scheduler (_mtp_step at line 1955) rather than as a separate mtp_stepper.py module. Main has full MTP support: idle/has_draft phases, draft verification, bonus token, output parser integration (PR #63), mixed mode (PR #64). The benchmarks in this branch are nice-to-have but not required for the feature.

Already implemented on main. The `mtp_forward` is integrated directly into the scheduler (`_mtp_step` at line 1955) rather than as a separate `mtp_stepper.py` module. Main has full MTP support: idle/has_draft phases, draft verification, bonus token, output parser integration (PR #63), mixed mode (PR #64). The benchmarks in this branch are nice-to-have but not required for the feature.
Sign in to join this conversation.
No labels
bug
feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/omlx#50
No description provided.