[cache] Replace trim-then-concat with geometric growth (#3) #7
Loading…
Reference in a new issue
No description provided.
Delete branch "refactor/3-geometric-kv-growth"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Replaces the fixed +256-step trim-then-concat growth pattern in KVCache, BatchKVCache, QuantizedKVCache, and ChunkedKVCache with geometric (2x) doubling.
Old: Every 256 tokens, trim to exact size (copy), then concatenate with new 256-token buffer (another copy). 4 temporary copies per growth step per layer.
New: When capacity exceeded, allocate max(needed, current * 2) buffer, copy existing data in, swap. Single allocation, single copy. No trim step.
Growth spikes reduced from every 256 tokens to O(log n): 256->512->1024->2048->4096->8192->16384->32768 (7 spikes to reach 32K vs 128).
5 new tests added. 54/54 passed.