[cache] Default max_kv_size to 32768 (#4) #6
Loading…
Reference in a new issue
No description provided.
Delete branch "perf/4-default-max-kv-size"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Changes
max_kv_sizedefault fromNoneto32768across all generation entry points. Combined with PR #5 pre-allocation, KV cache memory is now flat and bounded by default.Changes
generate_step(),BatchGenerator.__init__(),make_prompt_cache()— default 32768generate.py,chat.py,cache_prompt.py— default 32768max_kv_size=Nonewhere unbounded behavior is neededTest results
49/49 passed