[CRITICAL] batch_quantized_cache extend() silently truncates KV data #67
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Severity: CRITICAL - Data loss during continuous batching.
Location:
omlx/cache/batch_quantized_cache.py,extend()method, lines ~180-215Problem: The
padhelper computesright = max_size - c.keys[0].shape[2] - left. Whenleft + c.keys[0].shape[2]exceedsmax_size,rightbecomes negative. The code then truncates actual KV tokens from the end.Example: Cache A has
_idx=100but buffershape[2]=200. Cache B has_idx=150. For A:left=50,right=200-200-50=-50. A is trimmed by 50 tokens.Fix:
max_sizeshould account for(left + buffer_size)across both caches.