fix(metal): correct Q4_0 contiguous kernel nibble extraction (#29) #39

2026-04-30T22:39:54+02:00

sleepy commented

2026-04-30 22:39:54 +02:00

Summary

Fix correctness bug in kernel_mul_mv_q4_0_f32_c contiguous Q4_0 kernel:

Only 4 of 8 nibbles per uint32_t were extracted (masks 0x0F, 0xF00, 0xF000, 0xF0000)
Nibbles not shifted to LSB before multiplication
qs indexing ignored il offset — all threads read same uint32_t values
Bias correction applied 4x with accumulating sumy instead of once per block

Fix

Extract all 8 nibbles per uint32_t with (qs >> (4*j)) & 0xF for j=0..7
Use il/8 offset to select correct uint32_t pair (qs[il/8] and qs[il/8+2])
Apply bias correction once per block with total sumy

Verification

Coherence: PASS — Qwen3.5-4B-Q4_0 generates coherent text
Benchmark: pp512: 1260.28 ± 1.59 t/s, tg128: 79.17 ± 0.02 t/s

Benchmark (Qwen3.5-4B-Q4_0, M4 Max, 1 thread)

test	t/s
pp512	1260.28 ± 1.59
tg128	79.17 ± 0.02

## Summary Fix correctness bug in `kernel_mul_mv_q4_0_f32_c` contiguous Q4_0 kernel: - **Only 4 of 8 nibbles per uint32_t were extracted** (masks 0x0F, 0xF00, 0xF000, 0xF0000) - **Nibbles not shifted to LSB** before multiplication - **qs indexing ignored il offset** — all threads read same uint32_t values - **Bias correction applied 4x** with accumulating sumy instead of once per block ## Fix - Extract all 8 nibbles per uint32_t with `(qs >> (4*j)) & 0xF` for j=0..7 - Use `il/8` offset to select correct uint32_t pair (qs[il/8] and qs[il/8+2]) - Apply bias correction once per block with total sumy ## Verification - Coherence: PASS — Qwen3.5-4B-Q4_0 generates coherent text - Benchmark: pp512: 1260.28 ± 1.59 t/s, tg128: 79.17 ± 0.02 t/s ## Benchmark (Qwen3.5-4B-Q4_0, M4 Max, 1 thread) | test | t/s | |------|-----| | pp512 | 1260.28 ± 1.59 | | tg128 | 79.17 ± 0.02 |

sleepy added 2 commits 2026-04-30 22:39:55 +02:00

[metal] extend bin op fusion to MUL/SUB/DIV chains (#28 )

CI (apple) / macOS-latest-ios (pull_request) Waiting to run

Details

CI (apple) / macos-latest-ios-xcode (pull_request) Waiting to run

Details

CI (apple) / macOS-latest-tvos (pull_request) Waiting to run

Details

CI (apple) / macOS-latest-visionos (pull_request) Waiting to run

Details

CI (apple) / macOS-latest-swift (generic/platform=iOS) (pull_request) Blocked by required conditions

Details

CI (apple) / macOS-latest-swift (generic/platform=macOS) (pull_request) Blocked by required conditions

Details

CI (apple) / macOS-latest-swift (generic/platform=tvOS) (pull_request) Blocked by required conditions

Details

CI (self-hosted) / ggml-ci-nvidia-cuda (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-mac-metal (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-mac-webgpu (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-mac-vulkan (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-linux-intel-vulkan (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-win-intel-vulkan (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (pull_request) Waiting to run

Details

CI / build-cmake-pkg (pull_request) Waiting to run

Details

CI / macOS-latest-arm64 (pull_request) Waiting to run

Details

CI / macOS-latest-x64 (pull_request) Waiting to run

Details

CI / macOS-latest-arm64-webgpu (pull_request) Waiting to run

Details

CI / ubuntu-cpu (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run

Details

CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (pull_request) Waiting to run

Details

CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (pull_request) Waiting to run

Details

CI / ubuntu-cpu (x64, ubuntu-22.04) (pull_request) Waiting to run

Details

CI / android-arm64 (pull_request) Waiting to run

Details

CI / ubuntu-latest-rpc (pull_request) Waiting to run

Details

CI / ubuntu-24-vulkan (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run

Details

CI / ubuntu-24-vulkan (x64, ubuntu-24.04) (pull_request) Waiting to run

Details

CI / ubuntu-24-webgpu (pull_request) Waiting to run

Details

CI / ubuntu-24-webgpu-wasm (pull_request) Waiting to run

Details

CI / ubuntu-22-hip (pull_request) Waiting to run

Details

CI / ubuntu-22-musa (pull_request) Waiting to run

Details

CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (pull_request) Waiting to run

Details

CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (pull_request) Waiting to run

Details

CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (pull_request) Waiting to run

Details

CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (pull_request) Waiting to run

Details

CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (pull_request) Waiting to run

Details

CI / ubuntu-latest-cuda (pull_request) Waiting to run

Details

CI / windows-2022-cuda (12.4) (pull_request) Waiting to run

Details

CI / windows-latest-hip (pull_request) Waiting to run

Details

CI / ubuntu-cpu-riscv64-native (pull_request) Waiting to run

Details

CI / ggml-ci-x64-cpu-low-perf (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-low-perf (pull_request) Waiting to run

Details

CI / ggml-ci-x64-cpu-high-perf (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-high-perf (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-high-perf-sve (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-kleidiai (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (pull_request) Waiting to run

Details

EditorConfig Checker / editorconfig (pull_request) Waiting to run

Details

Server / server (default) (pull_request) Waiting to run

Details

Server / server (backend-sampling) (pull_request) Waiting to run

Details

Server / server-windows (pull_request) Waiting to run

Details

Pull Request Labeler / labeler (pull_request_target) Waiting to run

Details

eeb79b026b

[metal] wire contiguous Q4_0 kernel into dispatch (#29 )

Pull Request Labeler / labeler (pull_request_target) Waiting to run

Details

CI (apple) / macOS-latest-ios (pull_request) Has been cancelled

Details

CI (apple) / macos-latest-ios-xcode (pull_request) Has been cancelled

Details

CI (apple) / macOS-latest-tvos (pull_request) Has been cancelled

Details

CI (apple) / macOS-latest-visionos (pull_request) Has been cancelled

Details

CI (apple) / macOS-latest-swift (generic/platform=iOS) (pull_request) Has been cancelled

Details

CI (apple) / macOS-latest-swift (generic/platform=macOS) (pull_request) Has been cancelled

Details

CI (apple) / macOS-latest-swift (generic/platform=tvOS) (pull_request) Has been cancelled

Details

CI (self-hosted) / ggml-ci-nvidia-cuda (pull_request) Has been cancelled

Details

CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (pull_request) Has been cancelled

Details

CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (pull_request) Has been cancelled

Details

CI (self-hosted) / ggml-ci-mac-metal (pull_request) Has been cancelled

Details

CI (self-hosted) / ggml-ci-mac-webgpu (pull_request) Has been cancelled

Details

CI (self-hosted) / ggml-ci-mac-vulkan (pull_request) Has been cancelled

Details

CI (self-hosted) / ggml-ci-linux-intel-vulkan (pull_request) Has been cancelled

Details

CI (self-hosted) / ggml-ci-win-intel-vulkan (pull_request) Has been cancelled

Details

CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (pull_request) Has been cancelled

Details

CI / build-cmake-pkg (pull_request) Has been cancelled

Details

CI / macOS-latest-arm64 (pull_request) Has been cancelled

Details

CI / macOS-latest-x64 (pull_request) Has been cancelled

Details

CI / macOS-latest-arm64-webgpu (pull_request) Has been cancelled

Details

CI / ubuntu-cpu (arm64, ubuntu-24.04-arm) (pull_request) Has been cancelled

Details

CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (pull_request) Has been cancelled

Details

CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (pull_request) Has been cancelled

Details

CI / ubuntu-cpu (x64, ubuntu-22.04) (pull_request) Has been cancelled

Details

CI / android-arm64 (pull_request) Has been cancelled

Details

CI / ubuntu-latest-rpc (pull_request) Has been cancelled

Details

CI / ubuntu-24-vulkan (arm64, ubuntu-24.04-arm) (pull_request) Has been cancelled

Details

CI / ubuntu-24-vulkan (x64, ubuntu-24.04) (pull_request) Has been cancelled

Details

CI / ubuntu-24-webgpu (pull_request) Has been cancelled

Details

CI / ubuntu-24-webgpu-wasm (pull_request) Has been cancelled

Details

CI / ubuntu-22-hip (pull_request) Has been cancelled

Details

CI / ubuntu-22-musa (pull_request) Has been cancelled

Details

CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (pull_request) Has been cancelled

Details

CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (pull_request) Has been cancelled

Details

CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (pull_request) Has been cancelled

Details

CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (pull_request) Has been cancelled

Details

CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (pull_request) Has been cancelled

Details

CI / ubuntu-latest-cuda (pull_request) Has been cancelled

Details

CI / windows-2022-cuda (12.4) (pull_request) Has been cancelled

Details

CI / windows-latest-hip (pull_request) Has been cancelled

Details

CI / ubuntu-cpu-riscv64-native (pull_request) Has been cancelled

Details

CI / ggml-ci-x64-cpu-low-perf (pull_request) Has been cancelled

Details

CI / ggml-ci-arm64-cpu-low-perf (pull_request) Has been cancelled

Details

CI / ggml-ci-x64-cpu-high-perf (pull_request) Has been cancelled

Details

CI / ggml-ci-arm64-cpu-high-perf (pull_request) Has been cancelled

Details

CI / ggml-ci-arm64-cpu-high-perf-sve (pull_request) Has been cancelled

Details

CI / ggml-ci-arm64-cpu-kleidiai (pull_request) Has been cancelled

Details

CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (pull_request) Has been cancelled

Details

EditorConfig Checker / editorconfig (pull_request) Has been cancelled

Details

Server / server (default) (pull_request) Has been cancelled

Details

Server / server (backend-sampling) (pull_request) Has been cancelled

Details

Server / server-windows (pull_request) Has been cancelled

Details

06f05e71c1

sleepy commented

2026-04-30 22:41:51 +02:00

REJECTED: Contiguous kernel has incorrect nibble extraction. Only 4 nibbles per uint32_t are extracted (masks 0x0F, 0xF00, 0xF000, 0xF0000) instead of all 8. Nibbles are not shifted to LSB before multiplication. This produces garbage output. Rewrite the unpack logic to extract all 8 nibbles per uint32_t with proper shifts (>>0, >>4, >>8, >>12, >>16, >>20, >>24, >>28) and AND 0xF.

sleepy commented

2026-04-30 22:42:17 +02:00

REJECTED: Nibble extraction is incorrect. The masks (0x0F, 0xF00, 0xF000, 0xF0000) only extract 4 of 8 nibbles per uint32_t and do not shift to LSB. Each uint32_t holds 8 nibbles. Fix: extract all 8 nibbles with (qs[i] >> (4*j)) & 0xF for j=0..7, or compare against MLX qmv_fast_impl unpack logic. Also verify scale/delta application matches the strided kernel.

sleepy added 1 commit 2026-05-01 00:14:12 +02:00

fix(metal): correct Q4_0 contiguous kernel nibble extraction

CI (apple) / macOS-latest-ios (pull_request) Waiting to run

Details

CI (apple) / macos-latest-ios-xcode (pull_request) Waiting to run

Details

CI (apple) / macOS-latest-tvos (pull_request) Waiting to run

Details

CI (apple) / macOS-latest-visionos (pull_request) Waiting to run

Details

CI (apple) / macOS-latest-swift (generic/platform=iOS) (pull_request) Blocked by required conditions

Details

CI (apple) / macOS-latest-swift (generic/platform=macOS) (pull_request) Blocked by required conditions

Details

CI (apple) / macOS-latest-swift (generic/platform=tvOS) (pull_request) Blocked by required conditions

Details

CI (self-hosted) / ggml-ci-nvidia-cuda (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-mac-metal (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-mac-webgpu (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-mac-vulkan (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-linux-intel-vulkan (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-win-intel-vulkan (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (pull_request) Waiting to run

Details

CI / build-cmake-pkg (pull_request) Waiting to run

Details

CI / macOS-latest-arm64 (pull_request) Waiting to run

Details

CI / macOS-latest-x64 (pull_request) Waiting to run

Details

CI / macOS-latest-arm64-webgpu (pull_request) Waiting to run

Details

CI / ubuntu-cpu (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run

Details

CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (pull_request) Waiting to run

Details

CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (pull_request) Waiting to run

Details

CI / ubuntu-cpu (x64, ubuntu-22.04) (pull_request) Waiting to run

Details

CI / android-arm64 (pull_request) Waiting to run

Details

CI / ubuntu-latest-rpc (pull_request) Waiting to run

Details

CI / ubuntu-24-vulkan (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run

Details

CI / ubuntu-24-vulkan (x64, ubuntu-24.04) (pull_request) Waiting to run

Details

CI / ubuntu-24-webgpu (pull_request) Waiting to run

Details

CI / ubuntu-24-webgpu-wasm (pull_request) Waiting to run

Details

CI / ubuntu-22-hip (pull_request) Waiting to run

Details

CI / ubuntu-22-musa (pull_request) Waiting to run

Details

CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (pull_request) Waiting to run

Details

CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (pull_request) Waiting to run

Details

CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (pull_request) Waiting to run

Details

CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (pull_request) Waiting to run

Details

CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (pull_request) Waiting to run

Details

CI / ubuntu-latest-cuda (pull_request) Waiting to run

Details

CI / windows-2022-cuda (12.4) (pull_request) Waiting to run

Details

CI / windows-latest-hip (pull_request) Waiting to run

Details

CI / ubuntu-cpu-riscv64-native (pull_request) Waiting to run

Details

CI / ggml-ci-x64-cpu-low-perf (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-low-perf (pull_request) Waiting to run

Details

CI / ggml-ci-x64-cpu-high-perf (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-high-perf (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-high-perf-sve (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-kleidiai (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (pull_request) Waiting to run

Details

EditorConfig Checker / editorconfig (pull_request) Waiting to run

Details

Server / server (default) (pull_request) Waiting to run

Details

Server / server (backend-sampling) (pull_request) Waiting to run

Details

Server / server-windows (pull_request) Waiting to run

Details

Pull Request Labeler / labeler (pull_request_target) Waiting to run

Details

31ce8b1ae5

- Extract all 8 nibbles per uint32_t with proper bit shifts
- Use il-based offset for uint32_t selection (qs[il/8] and qs[il/8+2])
- Apply bias correction once per block instead of 4x accumulated

sleepy changed title from ~~[metal] wire contiguous Q4_0 kernel into dispatch (#29)~~ to fix(metal): correct Q4_0 contiguous kernel nibble extraction (#29)

2026-05-01 00:15:22 +02:00

CI (apple) / macOS-latest-ios (pull_request) Waiting to run

Details

CI (apple) / macos-latest-ios-xcode (pull_request) Waiting to run

Details

CI (apple) / macOS-latest-tvos (pull_request) Waiting to run

Details

CI (apple) / macOS-latest-visionos (pull_request) Waiting to run

Details

CI (apple) / macOS-latest-swift (generic/platform=iOS) (pull_request) Blocked by required conditions

Details

CI (apple) / macOS-latest-swift (generic/platform=macOS) (pull_request) Blocked by required conditions

Details

CI (apple) / macOS-latest-swift (generic/platform=tvOS) (pull_request) Blocked by required conditions

Details

CI (self-hosted) / ggml-ci-nvidia-cuda (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-mac-metal (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-mac-webgpu (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-mac-vulkan (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-linux-intel-vulkan (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-win-intel-vulkan (pull_request) Waiting to run

Details

CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (pull_request) Waiting to run

Details

CI / build-cmake-pkg (pull_request) Waiting to run

Details

CI / macOS-latest-arm64 (pull_request) Waiting to run

Details

CI / macOS-latest-x64 (pull_request) Waiting to run

Details

CI / macOS-latest-arm64-webgpu (pull_request) Waiting to run

Details

CI / ubuntu-cpu (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run

Details

CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (pull_request) Waiting to run

Details

CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (pull_request) Waiting to run

Details

CI / ubuntu-cpu (x64, ubuntu-22.04) (pull_request) Waiting to run

Details

CI / android-arm64 (pull_request) Waiting to run

Details

CI / ubuntu-latest-rpc (pull_request) Waiting to run

Details

CI / ubuntu-24-vulkan (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run

Details

CI / ubuntu-24-vulkan (x64, ubuntu-24.04) (pull_request) Waiting to run

Details

CI / ubuntu-24-webgpu (pull_request) Waiting to run

Details

CI / ubuntu-24-webgpu-wasm (pull_request) Waiting to run

Details

CI / ubuntu-22-hip (pull_request) Waiting to run

Details

CI / ubuntu-22-musa (pull_request) Waiting to run

Details

CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (pull_request) Waiting to run

Details

CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (pull_request) Waiting to run

Details

CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (pull_request) Waiting to run

Details

CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (pull_request) Waiting to run

Details

CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (pull_request) Waiting to run

Details

CI / ubuntu-latest-cuda (pull_request) Waiting to run

Details

CI / windows-2022-cuda (12.4) (pull_request) Waiting to run

Details

CI / windows-latest-hip (pull_request) Waiting to run

Details

CI / ubuntu-cpu-riscv64-native (pull_request) Waiting to run

Details

CI / ggml-ci-x64-cpu-low-perf (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-low-perf (pull_request) Waiting to run

Details

CI / ggml-ci-x64-cpu-high-perf (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-high-perf (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-high-perf-sve (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-kleidiai (pull_request) Waiting to run

Details

CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (pull_request) Waiting to run

Details

EditorConfig Checker / editorconfig (pull_request) Waiting to run

Details

Server / server (default) (pull_request) Waiting to run

Details

Server / server (backend-sampling) (pull_request) Waiting to run

Details

Server / server-windows (pull_request) Waiting to run

Details

Pull Request Labeler / labeler (pull_request_target) Waiting to run

Details

This pull request can be merged automatically.

This branch is out-of-date with the base branch

You are not authorized to merge this pull request.

View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.

git fetch -u origin fix/29-q40-contig-reads:fix/29-q40-contig-reads

git checkout fix/29-q40-contig-reads

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: sleepy/llama.cpp#39