Compare llama.cpp and MLX dispatch structure #36
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
MLX achieves roughly 24% higher effective bandwidth (355 vs 289 GB/s) on Qwen3.6-27B at 14+ GiB. The accumulation type is the same (F32). Thread organization is similar. The gap likely comes from dispatch structure and memory access patterns.
llama.cpp profile (9B Q4_0, ctx=256)
MLX investigation
Reference
MLX source: oMLX application bundle at mlx/include/mlx/backend/metal/kernels/