[perf] Prefill speed optimization for long contexts #34
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Current State
With QuantizedKVCache (q4_0) enabled, chunked prefill achieves ~200 tok/s.
Observed Behavior
Potential Optimizations
Benchmarks Needed
Acceptance Criteria
Needs benchmark data before implementation. Before tackling this, we need:
perfor timing instrumentationWithout this data, any implementation would be shooting in the dark.