[cache] Preserve Q4 quantization in prefix cache reconstruction (#70) #80
Loading…
Reference in a new issue
No description provided.
Delete branch "fix/70-reconstruct-quantized-cache"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes: #70
Problem:
reconstruct_cache()always dequantized to fp16, wasting 4x memory on exact prefix cache hits.Fix: When state contains quantized tuples and meta_state has valid bits/group_size, return
StreamQuantKVCachedirectly. Only fall back to fp16 when data is already fp16 or metadata is missing.Results: 80 passed, 3 deselected
934e2ed65fto58fc048fdb