CRITICAL: Cache misses with q4 KV quant — model stops before tool calls #48
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
After applying the fix for BatchQuantizedKVCache.finalize() _idx corruption (#47), the model still stops before tool calls when q4 KV quant is active.
Additionally, the cache appears to be missing (not hitting) with q4 KV quant enabled. Tool calling works fine with fp16 KV cache.
Model
Observed Behavior
Hypotheses
Related Issues
Acceptance Criteria
Merged via PR #62 (squash). Root cause: _apply_quantized_kv() was called on restored prefix caches, destroying their fp16 KV data. Fix: skip quantization when existing_cache is provided, both in _do_external_prefill and _transition_to_mtp.