[metal] extend bin op fusion to MUL/SUB/DIV chains (#28) #38

2026-04-30T20:16:59+02:00

sleepy commented

2026-04-30 20:16:59 +02:00

Summary

Extend binary op fusion in ggml_metal_op_bin to handle MUL/SUB/DIV chains, not just ADD
Update graph-level reorder in ggml_graph_optimize to consider MUL/SUB/DIV as fusion starting points
Enforce same-type-only fusion for bin ops (kernel uses single FC_OP constant)

Results

No regression on 4B/9B benchmarks
No practical impact: current graph has no consecutive same-type bin op chains with matching src1 layouts (all nf=1)
Code is more correct and ready if graph structure changes

## Summary - Extend binary op fusion in `ggml_metal_op_bin` to handle MUL/SUB/DIV chains, not just ADD - Update graph-level reorder in `ggml_graph_optimize` to consider MUL/SUB/DIV as fusion starting points - Enforce same-type-only fusion for bin ops (kernel uses single FC_OP constant) ## Results - No regression on 4B/9B benchmarks - No practical impact: current graph has no consecutive same-type bin op chains with matching src1 layouts (all nf=1) - Code is more correct and ready if graph structure changes

sleepy added 1 commit 2026-04-30 20:17:00 +02:00

[metal] extend bin op fusion to MUL/SUB/DIV chains (#28 )

CI (apple) / macOS-latest-ios (pull_request) Waiting to run

Details

CI (apple) / macos-latest-ios-xcode (pull_request) Waiting to run

Details

CI (apple) / macOS-latest-tvos (pull_request) Waiting to run

Details

CI (apple) / macOS-latest-visionos (pull_request) Waiting to run

Details

CI (apple) / macOS-latest-swift (generic/platform=iOS) (pull_request) Blocked by required conditions

Details

CI (apple) / macOS-latest-swift (generic/platform=macOS) (pull_request) Blocked by required conditions

Details

CI (apple) / macOS-latest-swift (generic/platform=tvOS) (pull_request) Blocked by required conditions

Details

CI (self-hosted) / ggml-ci-nvidia-cuda (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-mac-metal (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-mac-webgpu (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-mac-vulkan (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-linux-intel-vulkan (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-win-intel-vulkan (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (pull_request) Waiting to run

Details

CI / build-cmake-pkg (pull_request) Waiting to run

Details

CI / macOS-latest-arm64 (pull_request) Waiting to run

Details

CI / macOS-latest-x64 (pull_request) Waiting to run

Details

CI / macOS-latest-arm64-webgpu (pull_request) Waiting to run

Details

CI / ubuntu-cpu (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run

Details

CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (pull_request) Waiting to run

Details

CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (pull_request) Waiting to run

Details

CI / ubuntu-cpu (x64, ubuntu-22.04) (pull_request) Waiting to run

Details

CI / android-arm64 (pull_request) Waiting to run

Details

CI / ubuntu-latest-rpc (pull_request) Waiting to run

Details

CI / ubuntu-24-vulkan (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run

Details

CI / ubuntu-24-vulkan (x64, ubuntu-24.04) (pull_request) Waiting to run

Details

CI / ubuntu-24-webgpu (pull_request) Waiting to run

Details

CI / ubuntu-24-webgpu-wasm (pull_request) Waiting to run

Details

CI / ubuntu-22-hip (pull_request) Waiting to run

Details

CI / ubuntu-22-musa (pull_request) Waiting to run

Details

CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (pull_request) Waiting to run

Details

CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (pull_request) Waiting to run

Details

CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (pull_request) Waiting to run

Details

CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (pull_request) Waiting to run

Details

CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (pull_request) Waiting to run

Details

CI / ubuntu-latest-cuda (pull_request) Waiting to run

Details

CI / windows-2022-cuda (12.4) (pull_request) Waiting to run

Details

CI / windows-latest-hip (pull_request) Waiting to run

Details

CI / ubuntu-cpu-riscv64-native (pull_request) Waiting to run

Details

CI / ggml-ci-x64-cpu-low-perf (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-low-perf (pull_request) Waiting to run

Details

CI / ggml-ci-x64-cpu-high-perf (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-high-perf (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-high-perf-sve (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-kleidiai (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (pull_request) Waiting to run

Details

EditorConfig Checker / editorconfig (pull_request) Waiting to run

Details

Server / server (default) (pull_request) Waiting to run

Details

Server / server (backend-sampling) (pull_request) Waiting to run

Details

Server / server-windows (pull_request) Waiting to run

Details

Pull Request Labeler / labeler (pull_request_target) Waiting to run

Details

eeb79b026b

sleepy merged commit 8c532835be into master

2026-04-30 21:03:15 +02:00

sleepy commented

2026-04-30 21:04:03 +02:00

Merged via squash. Coherence test passed (token output byte-identical to master).

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: sleepy/llama.cpp#38