CRITICAL: BatchQuantizedKVCache _idx corruption during finalize and state operations #45

Closed
opened 2026-05-09 18:02:06 +02:00 by sleepy · 0 comments
Owner

Bug 1: finalize() does not update _idx after rolling

finalize() rolls cache tensors but does not update _idx, causing _idx to drift from actual valid token positions.

Bug 2: state setter uses tensor allocated size instead of actual token count

The state setter sets _idx = self.keys[0].shape[2] which is the allocated size, not the actual token count.

Impact

  • Attention mask includes uninitialized padding
  • Model attends to garbage and emits EOS/stop tokens prematurely
  • Generation stops after ~50-100 thinking tokens

Fix

  1. Update _idx in finalize() to account for rolling
  2. State setter should preserve _idx or compute from offset/left_padding

File

omlx/cache/batch_quantized_cache.py

## Bug 1: finalize() does not update _idx after rolling `finalize()` rolls cache tensors but does not update `_idx`, causing _idx to drift from actual valid token positions. ## Bug 2: state setter uses tensor allocated size instead of actual token count The state setter sets `_idx = self.keys[0].shape[2]` which is the allocated size, not the actual token count. ## Impact - Attention mask includes uninitialized padding - Model attends to garbage and emits EOS/stop tokens prematurely - Generation stops after ~50-100 thinking tokens ## Fix 1. Update _idx in finalize() to account for rolling 2. State setter should preserve _idx or compute from offset/left_padding ## File omlx/cache/batch_quantized_cache.py
Sign in to join this conversation.
No labels
bug
feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/omlx#45
No description provided.