[perf] Default max_kv_size to 32768 #4
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
max_kv_sizedefaults toNoneeverywhere (generate.py:323, generate.py:1940), meaning KV caches grow unbounded. Long conversations OOM instead of gracefully degrading with sliding window attention.Solution
Default
max_kv_size=32768in bothgenerate_step()andBatchGenerator.__init__().Required changes
generate_step(): change default fromNoneto32768BatchGenerator.__init__(): change default fromNoneto32768stream_generate(): propagate the defaultgenerate()/ other entry points: propagatemax_kv_size=NoneAcceptance criteria
Notes
Merged via PR #6 (squash). Default max_kv_size=32768 across all generation entry points. Combined with PR #5 pre-allocation, KV cache memory is now flat and bounded by default. 49/49 tests passed.