[cache] Default max_kv_size to 32768 (#4) #6

Merged
sleepy merged 2 commits from perf/4-default-max-kv-size into main 2026-05-15 20:40:11 +02:00
Owner

Summary

Changes max_kv_size default from None to 32768 across all generation entry points. Combined with PR #5 pre-allocation, KV cache memory is now flat and bounded by default.

Changes

  • generate_step(), BatchGenerator.__init__(), make_prompt_cache() — default 32768
  • CLI args in generate.py, chat.py, cache_prompt.py — default 32768
  • Tests updated to pass max_kv_size=None where unbounded behavior is needed

Test results

49/49 passed

## Summary Changes `max_kv_size` default from `None` to `32768` across all generation entry points. Combined with PR #5 pre-allocation, KV cache memory is now flat and bounded by default. ## Changes - `generate_step()`, `BatchGenerator.__init__()`, `make_prompt_cache()` — default 32768 - CLI args in `generate.py`, `chat.py`, `cache_prompt.py` — default 32768 - Tests updated to pass `max_kv_size=None` where unbounded behavior is needed ## Test results 49/49 passed
Default max_kv_size to 32768 for bounded KV cache memory
Some checks are pending
Build and Test / check_lint (pull_request) Waiting to run
Build and Test / mac_build_and_test (pull_request) Blocked by required conditions
b1617383f1
Change default max_kv_size from None to 32768 across all public
generation entry points. This pairs with the pre-allocation from PR #5
to give flat memory usage by default while still allowing users to
pass None for unbounded behavior.
Fix wasteful RotatingKVCache allocation in _make_new_cache and update docstring
Some checks failed
Build and Test / check_lint (pull_request) Has been cancelled
Build and Test / mac_build_and_test (pull_request) Has been cancelled
48ec392f27
Pass max_kv_size=None to make_prompt_cache in the bounded path so it
creates plain KVCache objects instead of throwaway RotatingKVCache(keep=4)
that are immediately replaced. Also clarify make_prompt_cache docstring
to note the 32768 default and that None disables bounded cache.
sleepy merged commit df607cab1a into main 2026-05-15 20:40:11 +02:00
Sign in to join this conversation.
No reviewers
No labels
feature
perf
refactor
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/mlx-lm!6
No description provided.