Investigate GET_ROWS overhead (678 MB/tick at 9B) #30
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
GET_ROWS reads 678 MB per tick at 9B -- second only to MUL_MAT (4.8 GB). This includes:
Data (9B Q4_0, ctx=256)
Questions
Context scaling
GET_ROWS bytes_in stays constant at 678 MB regardless of context length (only depends on model size), so it does not become more dominant at long contexts.
Investigate GET_ROWS overhead 678 MB per tick at 9Bto Investigate GET_ROWS overhead (678 MB/tick at 9B)