[cache] Skip q4 KV quantization on restored prefix caches (#48) #62

Merged
sleepy merged 2 commits from fix/48-q4-cache-prefix-restore into stable/pre-q4kv 2026-05-15 01:37:41 +02:00
Owner

Summary

  • Guard _apply_quantized_kv() in _do_external_prefill() to skip when existing_cache is provided
  • Root cause: restored prefix caches carry fp16 KV data that must be preserved; converting them to QuantizedKVCache destroys the cached data and causes cache misses

Closes #48

Test results

  • 80/80 scheduler tests pass
## Summary - Guard `_apply_quantized_kv()` in `_do_external_prefill()` to skip when `existing_cache` is provided - Root cause: restored prefix caches carry fp16 KV data that must be preserved; converting them to QuantizedKVCache destroys the cached data and causes cache misses Closes #48 ## Test results - 80/80 scheduler tests pass
When existing_cache is provided (restored prefix cache), _apply_quantized_kv
was replacing fp16 KVCache objects with QuantizedKVCache, destroying the
restored prefix data and causing effective cache misses. Skip quantization
for restored caches to preserve the prefix data intact.
sleepy merged commit bc2e714dd3 into stable/pre-q4kv 2026-05-15 01:37:41 +02:00
Sign in to join this conversation.
No reviewers
No labels
bug
feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/omlx!62
No description provided.