Tool response token leakage #17

Closed
opened 2026-05-04 23:06:34 +02:00 by sleepy · 3 comments
sleepy commented 2026-05-04 23:06:34 +02:00 (Migrated from localhost:18431)

Describe the bug
The model writes "user" sometimes and almost always "<tool_response>" right before a tool response when MTP is enabled. Best assumption is that the MTP heads expect a tool response and the main model accepts it.

To Reproduce
Steps to reproduce the behavior:
Just run it in a harness, it does it before almost every tool call.

Expected behavior
<tool_response> to not be generated, or at least stripped from context by the server. the leaking 'user' also suggests the regex around the content cleaning still needs work and doesn't capture all edge cases.

**Describe the bug** The model writes "user" sometimes and almost always "<tool_response>" right before a tool response when MTP is enabled. Best assumption is that the MTP heads expect a tool response and the main model accepts it. **To Reproduce** Steps to reproduce the behavior: Just run it in a harness, it does it before almost every tool call. **Expected behavior** <tool_response> to not be generated, or at least stripped from context by the server. the leaking 'user' also suggests the regex around the content cleaning still needs work and doesn't capture all edge cases.
Owner

Cross-reference: with q4 KV quant enabled, the model stops before tool calls entirely (see #48). This suggests cache corruption may be preventing the model from reaching the tool call generation phase, which would mask any token leakage symptoms. If the model never generates a tool response, the <tool_response> leakage cannot occur.

If #48 is resolved and tool calls resume but token leakage reappears, this issue should be revisited.

Cross-reference: with q4 KV quant enabled, the model stops before tool calls entirely (see #48). This suggests cache corruption may be preventing the model from reaching the tool call generation phase, which would mask any token leakage symptoms. If the model never generates a tool response, the <tool_response> leakage cannot occur. If #48 is resolved and tool calls resume but token leakage reappears, this issue should be revisited.
Owner

Related fix: PR #60 (issue #54) trims stop tokens from MTP output. The EOS leak fix ensures <|im_end|> no longer appears in output. However, the leaking of structural tokens like user/assistant markers is a different issue — these are not stop tokens but are generated by MTP speculative heads. This needs server-level tool calling tests to verify. Keeping open for now.

Related fix: PR #60 (issue #54) trims stop tokens from MTP output. The EOS leak fix ensures <|im_end|> no longer appears in output. However, the leaking of structural tokens like user/assistant markers is a different issue — these are not stop tokens but are generated by MTP speculative heads. This needs server-level tool calling tests to verify. Keeping open for now.
Owner

Closing as duplicate of #56, which was fixed via PR #63 (output parser integration into MTP step).

Closing as duplicate of #56, which was fixed via PR #63 (output parser integration into MTP step).
Sign in to join this conversation.
No labels
bug
feature
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/omlx#17
No description provided.