Reach 37+ tok/s decode target (match llama.cpp) #34
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Target
Match or beat llama.cpp baseline of 37.3 tok/s decode on Qwen3.5-4B BF16, M4 Max 36GB.
Current blockers
After those are fixed, likely optimizations needed:
Prerequisites
Max 2 attempts per optimization.