CRITICAL: QuantizedKVCacheHandler reconstruct_cache overrides correct offset with meta_state #42

Closed
opened 2026-05-09 18:01:16 +02:00 by sleepy · 0 comments
Owner

Bug

QuantizedKVCacheHandler.reconstruct_cache() first correctly sets cache.offset = keys[0].shape[2] (the actual concatenated tensor length), but then overrides it with meta_state[0] which contains the FIRST BLOCK'S offset only.

Impact

  • Reconstructed cache thinks it has 256 tokens when it actually has 768
  • Wrong RoPE position embeddings
  • Incorrect causal attention mask - model can only see first block
  • Generation stops after thinking tokens because model never saw full context

Fix

Remove the meta_state offset override. Always use tensor shape for offset, matching KVCacheHandler behavior.

File

omlx/cache/type_handlers.py:501-510

## Bug `QuantizedKVCacheHandler.reconstruct_cache()` first correctly sets `cache.offset = keys[0].shape[2]` (the actual concatenated tensor length), but then overrides it with `meta_state[0]` which contains the FIRST BLOCK'S offset only. ## Impact - Reconstructed cache thinks it has 256 tokens when it actually has 768 - Wrong RoPE position embeddings - Incorrect causal attention mask - model can only see first block - Generation stops after thinking tokens because model never saw full context ## Fix Remove the meta_state offset override. Always use tensor shape for offset, matching KVCacheHandler behavior. ## File omlx/cache/type_handlers.py:501-510
Sign in to join this conversation.
No labels
bug
feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/omlx#42
No description provided.