[bug] Thinking tokens leak into user-visible output via MTP path #1
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
When using the MTP fast path in BatchedEngine.stream_generate(), Qwen3.6 thinking blocks are streamed verbatim to the user instead of being stripped.
The MTP fast path calls mlx-lm stream_generate() directly, bypassing oMLX thinking token handling.
Acceptance criteria:
Root cause identified:
The standard batching path sends RAW text (with
<think>tags intact) to the client. The web UI then usesextractThinking()to separate thinking blocks from content and renders them separately.The MTP fast path incorrectly tries to strip thinking on the SERVER with
ThinkingParser, which:ThinkingParser.feed()doesn't handle incremental detokenized text properlyFix needed:
ThinkingParserfrom MTP fast pathresponse.text(which is already detokenized incremental text from mlx-lm)extractThinking()handle thinking separation, as it did before MTP