[feature] SpecPrefill attention-based sparse prefill for MoE #58

Closed
opened 2026-05-14 23:29:42 +02:00 by sleepy · 1 comment
Owner

The scheduler has SpecPrefill infrastructure (commit c5081e3) but it needs work to be production-ready for MoE models like Mixtral, Qwen3-MoE.

Current state: draft code exists, disabled by default.

Acceptance criteria:

  • SpecPrefill drops irrelevant expert computations during prefill
  • Measurable prefill speedup on MoE models (>20%)
  • No quality regression (perplexity within 0.1%)
  • Works with continuous batching
The scheduler has SpecPrefill infrastructure (commit c5081e3) but it needs work to be production-ready for MoE models like Mixtral, Qwen3-MoE. Current state: draft code exists, disabled by default. Acceptance criteria: - SpecPrefill drops irrelevant expert computations during prefill - Measurable prefill speedup on MoE models (>20%) - No quality regression (perplexity within 0.1%) - Works with continuous batching
Author
Owner

Closed — not prioritized.

Closed — not prioritized.
Sign in to join this conversation.
No labels
bug
feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/omlx#58
No description provided.