[cache] Pre-allocate RotatingKVCache to max_kv_size upfront (#2) #5

Merged
sleepy merged 1 commit from perf/2-prealloc-kv-cache into main 2026-05-15 20:05:28 +02:00
Owner

Summary

When max_kv_size is set, RotatingKVCache and BatchRotatingKVCache now allocate the full max_size buffer on first update_and_fetch instead of growing 256 tokens at a time.

This eliminates repeated mx.concatenate calls during generation, which caused multi-GB transient memory spikes (4 temporary copies per layer per growth step).

Changes

  • RotatingKVCache._update_in_place: pre-allocate max_size columns, copy existing data via slice assignment
  • BatchRotatingKVCache._update_in_place: same pattern
  • 3 new tests: verify pre-allocation and shape stability

Test results

49/49 passed (including 3 new pre-allocation tests)

Benchmark

Memory is flat after initial allocation — no spikes during generation.

## Summary When `max_kv_size` is set, `RotatingKVCache` and `BatchRotatingKVCache` now allocate the full `max_size` buffer on first `update_and_fetch` instead of growing 256 tokens at a time. This eliminates repeated `mx.concatenate` calls during generation, which caused multi-GB transient memory spikes (4 temporary copies per layer per growth step). ## Changes - `RotatingKVCache._update_in_place`: pre-allocate `max_size` columns, copy existing data via slice assignment - `BatchRotatingKVCache._update_in_place`: same pattern - 3 new tests: verify pre-allocation and shape stability ## Test results 49/49 passed (including 3 new pre-allocation tests) ## Benchmark Memory is flat after initial allocation — no spikes during generation.
Pre-allocate KV cache to max_kv_size upfront
Some checks failed
Build and Test / check_lint (pull_request) Has been cancelled
Build and Test / mac_build_and_test (pull_request) Has been cancelled
bcfea9f749
Instead of growing the rotating KV cache 256 tokens at a time via
concatenate, allocate the full max_size buffer on first update_and_fetch.
This eliminates multi-GB memory spikes from trim-then-concat during
generation.

Applies to both RotatingKVCache and BatchRotatingKVCache.
sleepy merged commit 07eaf3679c into main 2026-05-15 20:05:28 +02:00
Sign in to join this conversation.
No reviewers
No labels
feature
perf
refactor
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/mlx-lm!5
No description provided.