[cache] Pre-allocate RotatingKVCache to max_kv_size upfront (#2) #5
Loading…
Reference in a new issue
No description provided.
Delete branch "perf/2-prealloc-kv-cache"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
When
max_kv_sizeis set,RotatingKVCacheandBatchRotatingKVCachenow allocate the fullmax_sizebuffer on firstupdate_and_fetchinstead of growing 256 tokens at a time.This eliminates repeated
mx.concatenatecalls during generation, which caused multi-GB transient memory spikes (4 temporary copies per layer per growth step).Changes
RotatingKVCache._update_in_place: pre-allocatemax_sizecolumns, copy existing data via slice assignmentBatchRotatingKVCache._update_in_place: same patternTest results
49/49 passed (including 3 new pre-allocation tests)
Benchmark
Memory is flat after initial allocation — no spikes during generation.