IQ4_XS tg4096 anomaly (45 vs 76 tok/s on 4B) #33

Closed
opened 2026-04-30 18:11:36 +02:00 by sleepy · 0 comments
Owner

Not reproducible. IQ4_XS now performs on par with Q4_0 (53.25 vs 52.05 tok/s at tg4096 on 9B). Original anomaly was likely due to Metal driver state, GPU caching, or thermal conditions.

Not reproducible. IQ4_XS now performs on par with Q4_0 (53.25 vs 52.05 tok/s at tg4096 on 9B). Original anomaly was likely due to Metal driver state, GPU caching, or thermal conditions.
sleepy added the bug label 2026-04-30 18:11:36 +02:00
sleepy changed title from IQ4_XS tg4096 anomaly - 45 vs 76 tok per s on 4B to IQ4_XS tg4096 anomaly (45 vs 76 tok/s on 4B) 2026-04-30 18:16:38 +02:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: sleepy/llama.cpp#33