[MEDIUM] BatchQuantizedKVCache.extend() pads using buffer capacity instead of used length #109

Closed
opened 2026-05-20 12:33:16 +02:00 by sleepy · 0 comments
Owner

In omlx/cache/batch_quantized_cache.py lines 185-188, extend() calculates max_size using self.keys[0].shape[2] (buffer capacity) instead of self._idx (used length). If a cache has capacity > _idx, stale data beyond _idx is carried into the padded result, corrupting KV positions.

Fix

Use self._idx and other._idx instead of buffer shape for size calculations.

In omlx/cache/batch_quantized_cache.py lines 185-188, extend() calculates max_size using self.keys[0].shape[2] (buffer capacity) instead of self._idx (used length). If a cache has capacity > _idx, stale data beyond _idx is carried into the padded result, corrupting KV positions. ## Fix Use self._idx and other._idx instead of buffer shape for size calculations.
Sign in to join this conversation.
No labels
bug
feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/omlx#109
No description provided.