generate() doesn't use KV cache — re-runs full forward pass every token #12
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
TergentModel.generate()(model.py:165-178) re-runs the entire forward pass for every generated token:With 20 layers and ternary weights, this is extremely slow. The
TransformerBlock.forward()already returns KV cache (kv), butgenerate()discards it.Impact
Action needed
past_kvthroughgenerate()so each layer reuses its cached KVforward()to accept and return per-layer KV cachesgenerate()method that handles caching explicitlyFiles
tergent/model.py:165-178