Tool response token leakage #17
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Describe the bug
The model writes "user" sometimes and almost always "<tool_response>" right before a tool response when MTP is enabled. Best assumption is that the MTP heads expect a tool response and the main model accepts it.
To Reproduce
Steps to reproduce the behavior:
Just run it in a harness, it does it before almost every tool call.
Expected behavior
<tool_response> to not be generated, or at least stripped from context by the server. the leaking 'user' also suggests the regex around the content cleaning still needs work and doesn't capture all edge cases.
Cross-reference: with q4 KV quant enabled, the model stops before tool calls entirely (see #48). This suggests cache corruption may be preventing the model from reaching the tool call generation phase, which would mask any token leakage symptoms. If the model never generates a tool response, the <tool_response> leakage cannot occur.
If #48 is resolved and tool calls resume but token leakage reappears, this issue should be revisited.
Related fix: PR #60 (issue #54) trims stop tokens from MTP output. The EOS leak fix ensures <|im_end|> no longer appears in output. However, the leaking of structural tokens like user/assistant markers is a different issue — these are not stop tokens but are generated by MTP speculative heads. This needs server-level tool calling tests to verify. Keeping open for now.
Closing as duplicate of #56, which was fixed via PR #63 (output parser integration into MTP step).