CRITICAL: BatchQuantizedKVCache _idx corruption during finalize and state operations #45

New issue

Closed

opened 2026-05-09 18:02:06 +02:00 by sleepy · 0 comments

sleepy commented

2026-05-09 18:02:06 +02:00

Owner

Bug 1: finalize() does not update _idx after rolling

finalize() rolls cache tensors but does not update _idx, causing _idx to drift from actual valid token positions.

Bug 2: state setter uses tensor allocated size instead of actual token count

The state setter sets _idx = self.keys[0].shape[2] which is the allocated size, not the actual token count.

Impact

Attention mask includes uninitialized padding
Model attends to garbage and emits EOS/stop tokens prematurely
Generation stops after ~50-100 thinking tokens

Fix

Update _idx in finalize() to account for rolling
State setter should preserve _idx or compute from offset/left_padding

File

omlx/cache/batch_quantized_cache.py

## Bug 1: finalize() does not update _idx after rolling `finalize()` rolls cache tensors but does not update `_idx`, causing _idx to drift from actual valid token positions. ## Bug 2: state setter uses tensor allocated size instead of actual token count The state setter sets `_idx = self.keys[0].shape[2]` which is the allocated size, not the actual token count. ## Impact - Attention mask includes uninitialized padding - Model attends to garbage and emits EOS/stop tokens prematurely - Generation stops after ~50-100 thinking tokens ## Fix 1. Update _idx in finalize() to account for rolling 2. State setter should preserve _idx or compute from offset/left_padding ## File omlx/cache/batch_quantized_cache.py

sleepy referenced this issue from a commit

2026-05-09 18:18:53 +02:00

fix(cache): BatchQuantizedKVCache _idx corruption during finalize and state ops (#45)

No labels

bug

feature

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

sleepy/omlx#45

No description provided.

Rows
Columns