[cache] Replace trim-then-concat with geometric growth (#3) #7

Merged
sleepy merged 1 commit from refactor/3-geometric-kv-growth into main 2026-05-15 20:51:39 +02:00
Owner

Replaces the fixed +256-step trim-then-concat growth pattern in KVCache, BatchKVCache, QuantizedKVCache, and ChunkedKVCache with geometric (2x) doubling.

Old: Every 256 tokens, trim to exact size (copy), then concatenate with new 256-token buffer (another copy). 4 temporary copies per growth step per layer.

New: When capacity exceeded, allocate max(needed, current * 2) buffer, copy existing data in, swap. Single allocation, single copy. No trim step.

Growth spikes reduced from every 256 tokens to O(log n): 256->512->1024->2048->4096->8192->16384->32768 (7 spikes to reach 32K vs 128).

5 new tests added. 54/54 passed.

Replaces the fixed +256-step trim-then-concat growth pattern in KVCache, BatchKVCache, QuantizedKVCache, and ChunkedKVCache with geometric (2x) doubling. Old: Every 256 tokens, trim to exact size (copy), then concatenate with new 256-token buffer (another copy). 4 temporary copies per growth step per layer. New: When capacity exceeded, allocate max(needed, current * 2) buffer, copy existing data in, swap. Single allocation, single copy. No trim step. Growth spikes reduced from every 256 tokens to O(log n): 256->512->1024->2048->4096->8192->16384->32768 (7 spikes to reach 32K vs 128). 5 new tests added. 54/54 passed.
Replace trim-then-concat with geometric (2x) doubling in KV caches
Some checks failed
Build and Test / check_lint (pull_request) Has been cancelled
Build and Test / mac_build_and_test (pull_request) Has been cancelled
db0231bb66
Replace the fixed +256-step trim-then-concat growth pattern in KVCache,
BatchKVCache, QuantizedKVCache, and ChunkedKVCache with geometric doubling.

- No trim step (eliminates wasteful full-copy to step-aligned size)
- No concatenate (single allocate + copy instead of trim-then-concat)
- Geometric capacity: max(needed, current * 2) instead of current + 256
- Reduces growth spikes from every 256 tokens to O(log n)
- Add growth pattern and boundary correctness tests
sleepy merged commit db85c2a413 into main 2026-05-15 20:51:39 +02:00
Sign in to join this conversation.
No reviewers
No labels
feature
perf
refactor
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/mlx-lm!7
No description provided.