achieve MLX generation t/s — 22 t/s on 27B Q4_0 (M4 Max) #40
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Goal
Match MLX generation throughput on 27B models. Target: 22 t/s on Qwen3.6-27B-Q4_0 at tg128 on M4 Max.
MLX Baseline
MLX-lm achieves ~22 t/s on 27B Q4_0. llama.cpp currently lags behind. This is the north star metric.
MLX Testing
MLX models at: ~/.omlx/models/
Run MLX-lm benchmarks for comparison.
Onboarding — What to Read
When No More Issues Remain
If all tracked issues are resolved and we still have not hit 22 t/s:
Progress Tracking
Record all benchmark results in BENCHMARKS.md with date and commit hash.
[perf] achieve MLX generation t/s — 22 t/s on 27B Q4_0 (M4 Max)to achieve MLX generation t/s — 22 t/s on 27B Q4_0 (M4 Max)