KV cache IO scaling with context length #32
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
As context length grows, KV cache reads and writes increase. This investigation tracks when KV cache IO becomes significant relative to weight reads.
Data (9B Q4_0, Qwen3.5: 8 full-attention layers, 24 GatedDeltaNet)
Observations
Architecture note
Only 8 of 32 layers use full attention (KV cache). The other 24 are GatedDeltaNet (recurrent state, not KV cache).
Next steps