CRITICAL: QuantizedKVCacheHandler reconstruct_cache overrides correct offset with meta_state #42

New issue

Closed

opened 2026-05-09 18:01:16 +02:00 by sleepy · 0 comments

sleepy commented

2026-05-09 18:01:16 +02:00

Owner

Bug

QuantizedKVCacheHandler.reconstruct_cache() first correctly sets cache.offset = keys[0].shape[2] (the actual concatenated tensor length), but then overrides it with meta_state[0] which contains the FIRST BLOCK'S offset only.

Impact

Reconstructed cache thinks it has 256 tokens when it actually has 768
Wrong RoPE position embeddings
Incorrect causal attention mask - model can only see first block
Generation stops after thinking tokens because model never saw full context

Fix

Remove the meta_state offset override. Always use tensor shape for offset, matching KVCacheHandler behavior.

File

omlx/cache/type_handlers.py:501-510

## Bug `QuantizedKVCacheHandler.reconstruct_cache()` first correctly sets `cache.offset = keys[0].shape[2]` (the actual concatenated tensor length), but then overrides it with `meta_state[0]` which contains the FIRST BLOCK'S offset only. ## Impact - Reconstructed cache thinks it has 256 tokens when it actually has 768 - Wrong RoPE position embeddings - Incorrect causal attention mask - model can only see first block - Generation stops after thinking tokens because model never saw full context ## Fix Remove the meta_state offset override. Always use tensor shape for offset, matching KVCacheHandler behavior. ## File omlx/cache/type_handlers.py:501-510

sleepy referenced this issue from a commit

2026-05-09 18:04:36 +02:00

fix(cache): remove meta_state offset override in QuantizedKVCacheHandler (#42)

No labels

bug

feature

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

sleepy/omlx#42

No description provided.

Rows
Columns