[CRITICAL] batch_quantized_cache extend() silently truncates KV data #67

Closed
opened 2026-05-20 01:45:28 +02:00 by sleepy · 0 comments
Owner

Severity: CRITICAL - Data loss during continuous batching.

Location: omlx/cache/batch_quantized_cache.py, extend() method, lines ~180-215

Problem: The pad helper computes right = max_size - c.keys[0].shape[2] - left. When left + c.keys[0].shape[2] exceeds max_size, right becomes negative. The code then truncates actual KV tokens from the end.

Example: Cache A has _idx=100 but buffer shape[2]=200. Cache B has _idx=150. For A: left=50, right=200-200-50=-50. A is trimmed by 50 tokens.

Fix: max_size should account for (left + buffer_size) across both caches.

**Severity:** CRITICAL - Data loss during continuous batching. **Location:** `omlx/cache/batch_quantized_cache.py`, `extend()` method, lines ~180-215 **Problem:** The `pad` helper computes `right = max_size - c.keys[0].shape[2] - left`. When `left + c.keys[0].shape[2]` exceeds `max_size`, `right` becomes negative. The code then truncates actual KV tokens from the end. **Example:** Cache A has `_idx=100` but buffer `shape[2]=200`. Cache B has `_idx=150`. For A: `left=50`, `right=200-200-50=-50`. A is trimmed by 50 tokens. **Fix:** `max_size` should account for `(left + buffer_size)` across both caches.
Sign in to join this conversation.
No labels
bug
feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/omlx#67
No description provided.