Tool response token leakage

sleepy commented

2026-05-04 23:06:34 +02:00

(Migrated from localhost:18431)

Describe the bug
The model writes "user" sometimes and almost always "<tool_response>" right before a tool response when MTP is enabled. Best assumption is that the MTP heads expect a tool response and the main model accepts it.

To Reproduce
Steps to reproduce the behavior:
Just run it in a harness, it does it before almost every tool call.

Expected behavior
<tool_response> to not be generated, or at least stripped from context by the server. the leaking 'user' also suggests the regex around the content cleaning still needs work and doesn't capture all edge cases.

**Describe the bug** The model writes "user" sometimes and almost always "<tool_response>" right before a tool response when MTP is enabled. Best assumption is that the MTP heads expect a tool response and the main model accepts it. **To Reproduce** Steps to reproduce the behavior: Just run it in a harness, it does it before almost every tool call. **Expected behavior** <tool_response> to not be generated, or at least stripped from context by the server. the leaking 'user' also suggests the regex around the content cleaning still needs work and doesn't capture all edge cases.

sleepy commented

2026-05-09 23:28:35 +02:00

Owner

Cross-reference: with q4 KV quant enabled, the model stops before tool calls entirely (see #48). This suggests cache corruption may be preventing the model from reaching the tool call generation phase, which would mask any token leakage symptoms. If the model never generates a tool response, the <tool_response> leakage cannot occur.

If #48 is resolved and tool calls resume but token leakage reappears, this issue should be revisited.

Cross-reference: with q4 KV quant enabled, the model stops before tool calls entirely (see #48). This suggests cache corruption may be preventing the model from reaching the tool call generation phase, which would mask any token leakage symptoms. If the model never generates a tool response, the <tool_response> leakage cannot occur. If #48 is resolved and tool calls resume but token leakage reappears, this issue should be revisited.

sleepy referenced this issue

2026-05-14 23:29:42 +02:00

[bug] Tool response token leakage (#17 reopen) #56

sleepy commented

2026-05-15 01:50:13 +02:00

Owner

Related fix: PR #60 (issue #54) trims stop tokens from MTP output. The EOS leak fix ensures <|im_end|> no longer appears in output. However, the leaking of structural tokens like user/assistant markers is a different issue — these are not stop tokens but are generated by MTP speculative heads. This needs server-level tool calling tests to verify. Keeping open for now.

sleepy commented

2026-05-15 18:32:01 +02:00

Owner

Closing as duplicate of #56, which was fixed via PR #63 (output parser integration into MTP step).

sleepy closed this issue

2026-05-15 18:32:01 +02:00

Rows
Columns

Tool response token leakage #17