[perf] Full QuantizedKVCache batching support (#39) #40
Loading…
Reference in a new issue
No description provided.
Delete branch "feature/39-llamacpp-kv-cache-rewrite"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Fixes #39 by implementing full QuantizedKVCache support in batching and prefix cache paths.
Changes
QuantizedKVCache.__len__to return actual token count (was returning 0)BatchQuantizedKVCachewith full batching interface (update, extract, merge, filter, trim)_make_cacheand_merge_cachesto support QuantizedKVCacheBatchKVCache.extract()to preserve quantizationQuantizedKVCacheHandlerfor prefix cache reconstruction_apply_quantized_kv()to use correctupdate_and_fetchAPITest Results
Memory Impact
With Qwen3.6-27B at 50K context: