fix(metal): correct Q4_0 contiguous kernel nibble extraction (#29) #39

Open
sleepy wants to merge 3 commits from fix/29-q40-contig-reads into master
Owner

Summary

Fix correctness bug in kernel_mul_mv_q4_0_f32_c contiguous Q4_0 kernel:

  • Only 4 of 8 nibbles per uint32_t were extracted (masks 0x0F, 0xF00, 0xF000, 0xF0000)
  • Nibbles not shifted to LSB before multiplication
  • qs indexing ignored il offset — all threads read same uint32_t values
  • Bias correction applied 4x with accumulating sumy instead of once per block

Fix

  • Extract all 8 nibbles per uint32_t with (qs >> (4*j)) & 0xF for j=0..7
  • Use il/8 offset to select correct uint32_t pair (qs[il/8] and qs[il/8+2])
  • Apply bias correction once per block with total sumy

Verification

  • Coherence: PASS — Qwen3.5-4B-Q4_0 generates coherent text
  • Benchmark: pp512: 1260.28 ± 1.59 t/s, tg128: 79.17 ± 0.02 t/s

Benchmark (Qwen3.5-4B-Q4_0, M4 Max, 1 thread)

test t/s
pp512 1260.28 ± 1.59
tg128 79.17 ± 0.02
## Summary Fix correctness bug in `kernel_mul_mv_q4_0_f32_c` contiguous Q4_0 kernel: - **Only 4 of 8 nibbles per uint32_t were extracted** (masks 0x0F, 0xF00, 0xF000, 0xF0000) - **Nibbles not shifted to LSB** before multiplication - **qs indexing ignored il offset** — all threads read same uint32_t values - **Bias correction applied 4x** with accumulating sumy instead of once per block ## Fix - Extract all 8 nibbles per uint32_t with `(qs >> (4*j)) & 0xF` for j=0..7 - Use `il/8` offset to select correct uint32_t pair (qs[il/8] and qs[il/8+2]) - Apply bias correction once per block with total sumy ## Verification - Coherence: PASS — Qwen3.5-4B-Q4_0 generates coherent text - Benchmark: pp512: 1260.28 ± 1.59 t/s, tg128: 79.17 ± 0.02 t/s ## Benchmark (Qwen3.5-4B-Q4_0, M4 Max, 1 thread) | test | t/s | |------|-----| | pp512 | 1260.28 ± 1.59 | | tg128 | 79.17 ± 0.02 |
sleepy added 2 commits 2026-04-30 22:39:55 +02:00
[metal] extend bin op fusion to MUL/SUB/DIV chains (#28)
CI (apple) / macOS-latest-ios (pull_request) Waiting to run
CI (apple) / macos-latest-ios-xcode (pull_request) Waiting to run
CI (apple) / macOS-latest-tvos (pull_request) Waiting to run
CI (apple) / macOS-latest-visionos (pull_request) Waiting to run
CI (apple) / macOS-latest-swift (generic/platform=iOS) (pull_request) Blocked by required conditions
CI (apple) / macOS-latest-swift (generic/platform=macOS) (pull_request) Blocked by required conditions
CI (apple) / macOS-latest-swift (generic/platform=tvOS) (pull_request) Blocked by required conditions
CI (self-hosted) / ggml-ci-nvidia-cuda (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-mac-metal (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-mac-webgpu (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-mac-vulkan (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-linux-intel-vulkan (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-win-intel-vulkan (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (pull_request) Waiting to run
CI / build-cmake-pkg (pull_request) Waiting to run
CI / macOS-latest-arm64 (pull_request) Waiting to run
CI / macOS-latest-x64 (pull_request) Waiting to run
CI / macOS-latest-arm64-webgpu (pull_request) Waiting to run
CI / ubuntu-cpu (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run
CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (pull_request) Waiting to run
CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (pull_request) Waiting to run
CI / ubuntu-cpu (x64, ubuntu-22.04) (pull_request) Waiting to run
CI / android-arm64 (pull_request) Waiting to run
CI / ubuntu-latest-rpc (pull_request) Waiting to run
CI / ubuntu-24-vulkan (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run
CI / ubuntu-24-vulkan (x64, ubuntu-24.04) (pull_request) Waiting to run
CI / ubuntu-24-webgpu (pull_request) Waiting to run
CI / ubuntu-24-webgpu-wasm (pull_request) Waiting to run
CI / ubuntu-22-hip (pull_request) Waiting to run
CI / ubuntu-22-musa (pull_request) Waiting to run
CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (pull_request) Waiting to run
CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (pull_request) Waiting to run
CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (pull_request) Waiting to run
CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (pull_request) Waiting to run
CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (pull_request) Waiting to run
CI / ubuntu-latest-cuda (pull_request) Waiting to run
CI / windows-2022-cuda (12.4) (pull_request) Waiting to run
CI / windows-latest-hip (pull_request) Waiting to run
CI / ubuntu-cpu-riscv64-native (pull_request) Waiting to run
CI / ggml-ci-x64-cpu-low-perf (pull_request) Waiting to run
CI / ggml-ci-arm64-cpu-low-perf (pull_request) Waiting to run
CI / ggml-ci-x64-cpu-high-perf (pull_request) Waiting to run
CI / ggml-ci-arm64-cpu-high-perf (pull_request) Waiting to run
CI / ggml-ci-arm64-cpu-high-perf-sve (pull_request) Waiting to run
CI / ggml-ci-arm64-cpu-kleidiai (pull_request) Waiting to run
CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (pull_request) Waiting to run
EditorConfig Checker / editorconfig (pull_request) Waiting to run
Server / server (default) (pull_request) Waiting to run
Server / server (backend-sampling) (pull_request) Waiting to run
Server / server-windows (pull_request) Waiting to run
Pull Request Labeler / labeler (pull_request_target) Waiting to run
eeb79b026b
[metal] wire contiguous Q4_0 kernel into dispatch (#29)
Pull Request Labeler / labeler (pull_request_target) Waiting to run
CI (apple) / macOS-latest-ios (pull_request) Has been cancelled
CI (apple) / macos-latest-ios-xcode (pull_request) Has been cancelled
CI (apple) / macOS-latest-tvos (pull_request) Has been cancelled
CI (apple) / macOS-latest-visionos (pull_request) Has been cancelled
CI (apple) / macOS-latest-swift (generic/platform=iOS) (pull_request) Has been cancelled
CI (apple) / macOS-latest-swift (generic/platform=macOS) (pull_request) Has been cancelled
CI (apple) / macOS-latest-swift (generic/platform=tvOS) (pull_request) Has been cancelled
CI (self-hosted) / ggml-ci-nvidia-cuda (pull_request) Has been cancelled
CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (pull_request) Has been cancelled
CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (pull_request) Has been cancelled
CI (self-hosted) / ggml-ci-mac-metal (pull_request) Has been cancelled
CI (self-hosted) / ggml-ci-mac-webgpu (pull_request) Has been cancelled
CI (self-hosted) / ggml-ci-mac-vulkan (pull_request) Has been cancelled
CI (self-hosted) / ggml-ci-linux-intel-vulkan (pull_request) Has been cancelled
CI (self-hosted) / ggml-ci-win-intel-vulkan (pull_request) Has been cancelled
CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (pull_request) Has been cancelled
CI / build-cmake-pkg (pull_request) Has been cancelled
CI / macOS-latest-arm64 (pull_request) Has been cancelled
CI / macOS-latest-x64 (pull_request) Has been cancelled
CI / macOS-latest-arm64-webgpu (pull_request) Has been cancelled
CI / ubuntu-cpu (arm64, ubuntu-24.04-arm) (pull_request) Has been cancelled
CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (pull_request) Has been cancelled
CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (pull_request) Has been cancelled
CI / ubuntu-cpu (x64, ubuntu-22.04) (pull_request) Has been cancelled
CI / android-arm64 (pull_request) Has been cancelled
CI / ubuntu-latest-rpc (pull_request) Has been cancelled
CI / ubuntu-24-vulkan (arm64, ubuntu-24.04-arm) (pull_request) Has been cancelled
CI / ubuntu-24-vulkan (x64, ubuntu-24.04) (pull_request) Has been cancelled
CI / ubuntu-24-webgpu (pull_request) Has been cancelled
CI / ubuntu-24-webgpu-wasm (pull_request) Has been cancelled
CI / ubuntu-22-hip (pull_request) Has been cancelled
CI / ubuntu-22-musa (pull_request) Has been cancelled
CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (pull_request) Has been cancelled
CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (pull_request) Has been cancelled
CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (pull_request) Has been cancelled
CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (pull_request) Has been cancelled
CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (pull_request) Has been cancelled
CI / ubuntu-latest-cuda (pull_request) Has been cancelled
CI / windows-2022-cuda (12.4) (pull_request) Has been cancelled
CI / windows-latest-hip (pull_request) Has been cancelled
CI / ubuntu-cpu-riscv64-native (pull_request) Has been cancelled
CI / ggml-ci-x64-cpu-low-perf (pull_request) Has been cancelled
CI / ggml-ci-arm64-cpu-low-perf (pull_request) Has been cancelled
CI / ggml-ci-x64-cpu-high-perf (pull_request) Has been cancelled
CI / ggml-ci-arm64-cpu-high-perf (pull_request) Has been cancelled
CI / ggml-ci-arm64-cpu-high-perf-sve (pull_request) Has been cancelled
CI / ggml-ci-arm64-cpu-kleidiai (pull_request) Has been cancelled
CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (pull_request) Has been cancelled
EditorConfig Checker / editorconfig (pull_request) Has been cancelled
Server / server (default) (pull_request) Has been cancelled
Server / server (backend-sampling) (pull_request) Has been cancelled
Server / server-windows (pull_request) Has been cancelled
06f05e71c1
Author
Owner

REJECTED: Contiguous kernel has incorrect nibble extraction. Only 4 nibbles per uint32_t are extracted (masks 0x0F, 0xF00, 0xF000, 0xF0000) instead of all 8. Nibbles are not shifted to LSB before multiplication. This produces garbage output. Rewrite the unpack logic to extract all 8 nibbles per uint32_t with proper shifts (>>0, >>4, >>8, >>12, >>16, >>20, >>24, >>28) and AND 0xF.

REJECTED: Contiguous kernel has incorrect nibble extraction. Only 4 nibbles per uint32_t are extracted (masks 0x0F, 0xF00, 0xF000, 0xF0000) instead of all 8. Nibbles are not shifted to LSB before multiplication. This produces garbage output. Rewrite the unpack logic to extract all 8 nibbles per uint32_t with proper shifts (>>0, >>4, >>8, >>12, >>16, >>20, >>24, >>28) and AND 0xF.
Author
Owner

REJECTED: Nibble extraction is incorrect. The masks (0x0F, 0xF00, 0xF000, 0xF0000) only extract 4 of 8 nibbles per uint32_t and do not shift to LSB. Each uint32_t holds 8 nibbles. Fix: extract all 8 nibbles with (qs[i] >> (4*j)) & 0xF for j=0..7, or compare against MLX qmv_fast_impl unpack logic. Also verify scale/delta application matches the strided kernel.

REJECTED: Nibble extraction is incorrect. The masks (0x0F, 0xF00, 0xF000, 0xF0000) only extract 4 of 8 nibbles per uint32_t and do not shift to LSB. Each uint32_t holds 8 nibbles. Fix: extract all 8 nibbles with (qs[i] >> (4*j)) & 0xF for j=0..7, or compare against MLX qmv_fast_impl unpack logic. Also verify scale/delta application matches the strided kernel.
sleepy added 1 commit 2026-05-01 00:14:12 +02:00
fix(metal): correct Q4_0 contiguous kernel nibble extraction
CI (apple) / macOS-latest-ios (pull_request) Waiting to run
CI (apple) / macos-latest-ios-xcode (pull_request) Waiting to run
CI (apple) / macOS-latest-tvos (pull_request) Waiting to run
CI (apple) / macOS-latest-visionos (pull_request) Waiting to run
CI (apple) / macOS-latest-swift (generic/platform=iOS) (pull_request) Blocked by required conditions
CI (apple) / macOS-latest-swift (generic/platform=macOS) (pull_request) Blocked by required conditions
CI (apple) / macOS-latest-swift (generic/platform=tvOS) (pull_request) Blocked by required conditions
CI (self-hosted) / ggml-ci-nvidia-cuda (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-mac-metal (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-mac-webgpu (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-mac-vulkan (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-linux-intel-vulkan (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-win-intel-vulkan (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (pull_request) Waiting to run
CI / build-cmake-pkg (pull_request) Waiting to run
CI / macOS-latest-arm64 (pull_request) Waiting to run
CI / macOS-latest-x64 (pull_request) Waiting to run
CI / macOS-latest-arm64-webgpu (pull_request) Waiting to run
CI / ubuntu-cpu (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run
CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (pull_request) Waiting to run
CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (pull_request) Waiting to run
CI / ubuntu-cpu (x64, ubuntu-22.04) (pull_request) Waiting to run
CI / android-arm64 (pull_request) Waiting to run
CI / ubuntu-latest-rpc (pull_request) Waiting to run
CI / ubuntu-24-vulkan (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run
CI / ubuntu-24-vulkan (x64, ubuntu-24.04) (pull_request) Waiting to run
CI / ubuntu-24-webgpu (pull_request) Waiting to run
CI / ubuntu-24-webgpu-wasm (pull_request) Waiting to run
CI / ubuntu-22-hip (pull_request) Waiting to run
CI / ubuntu-22-musa (pull_request) Waiting to run
CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (pull_request) Waiting to run
CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (pull_request) Waiting to run
CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (pull_request) Waiting to run
CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (pull_request) Waiting to run
CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (pull_request) Waiting to run
CI / ubuntu-latest-cuda (pull_request) Waiting to run
CI / windows-2022-cuda (12.4) (pull_request) Waiting to run
CI / windows-latest-hip (pull_request) Waiting to run
CI / ubuntu-cpu-riscv64-native (pull_request) Waiting to run
CI / ggml-ci-x64-cpu-low-perf (pull_request) Waiting to run
CI / ggml-ci-arm64-cpu-low-perf (pull_request) Waiting to run
CI / ggml-ci-x64-cpu-high-perf (pull_request) Waiting to run
CI / ggml-ci-arm64-cpu-high-perf (pull_request) Waiting to run
CI / ggml-ci-arm64-cpu-high-perf-sve (pull_request) Waiting to run
CI / ggml-ci-arm64-cpu-kleidiai (pull_request) Waiting to run
CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (pull_request) Waiting to run
EditorConfig Checker / editorconfig (pull_request) Waiting to run
Server / server (default) (pull_request) Waiting to run
Server / server (backend-sampling) (pull_request) Waiting to run
Server / server-windows (pull_request) Waiting to run
Pull Request Labeler / labeler (pull_request_target) Waiting to run
31ce8b1ae5
- Extract all 8 nibbles per uint32_t with proper bit shifts
- Use il-based offset for uint32_t selection (qs[il/8] and qs[il/8+2])
- Apply bias correction once per block instead of 4x accumulated
sleepy changed title from [metal] wire contiguous Q4_0 kernel into dispatch (#29) to fix(metal): correct Q4_0 contiguous kernel nibble extraction (#29) 2026-05-01 00:15:22 +02:00
Some checks are pending
CI (apple) / macOS-latest-ios (pull_request) Waiting to run
CI (apple) / macos-latest-ios-xcode (pull_request) Waiting to run
CI (apple) / macOS-latest-tvos (pull_request) Waiting to run
CI (apple) / macOS-latest-visionos (pull_request) Waiting to run
CI (apple) / macOS-latest-swift (generic/platform=iOS) (pull_request) Blocked by required conditions
CI (apple) / macOS-latest-swift (generic/platform=macOS) (pull_request) Blocked by required conditions
CI (apple) / macOS-latest-swift (generic/platform=tvOS) (pull_request) Blocked by required conditions
CI (self-hosted) / ggml-ci-nvidia-cuda (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-mac-metal (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-mac-webgpu (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-mac-vulkan (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-linux-intel-vulkan (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-win-intel-vulkan (pull_request) Waiting to run
CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (pull_request) Waiting to run
CI / build-cmake-pkg (pull_request) Waiting to run
CI / macOS-latest-arm64 (pull_request) Waiting to run
CI / macOS-latest-x64 (pull_request) Waiting to run
CI / macOS-latest-arm64-webgpu (pull_request) Waiting to run
CI / ubuntu-cpu (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run
CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (pull_request) Waiting to run
CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (pull_request) Waiting to run
CI / ubuntu-cpu (x64, ubuntu-22.04) (pull_request) Waiting to run
CI / android-arm64 (pull_request) Waiting to run
CI / ubuntu-latest-rpc (pull_request) Waiting to run
CI / ubuntu-24-vulkan (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run
CI / ubuntu-24-vulkan (x64, ubuntu-24.04) (pull_request) Waiting to run
CI / ubuntu-24-webgpu (pull_request) Waiting to run
CI / ubuntu-24-webgpu-wasm (pull_request) Waiting to run
CI / ubuntu-22-hip (pull_request) Waiting to run
CI / ubuntu-22-musa (pull_request) Waiting to run
CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (pull_request) Waiting to run
CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (pull_request) Waiting to run
CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (pull_request) Waiting to run
CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (pull_request) Waiting to run
CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (pull_request) Waiting to run
CI / ubuntu-latest-cuda (pull_request) Waiting to run
CI / windows-2022-cuda (12.4) (pull_request) Waiting to run
CI / windows-latest-hip (pull_request) Waiting to run
CI / ubuntu-cpu-riscv64-native (pull_request) Waiting to run
CI / ggml-ci-x64-cpu-low-perf (pull_request) Waiting to run
CI / ggml-ci-arm64-cpu-low-perf (pull_request) Waiting to run
CI / ggml-ci-x64-cpu-high-perf (pull_request) Waiting to run
CI / ggml-ci-arm64-cpu-high-perf (pull_request) Waiting to run
CI / ggml-ci-arm64-cpu-high-perf-sve (pull_request) Waiting to run
CI / ggml-ci-arm64-cpu-kleidiai (pull_request) Waiting to run
CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (pull_request) Waiting to run
EditorConfig Checker / editorconfig (pull_request) Waiting to run
Server / server (default) (pull_request) Waiting to run
Server / server (backend-sampling) (pull_request) Waiting to run
Server / server-windows (pull_request) Waiting to run
Pull Request Labeler / labeler (pull_request_target) Waiting to run
This pull request can be merged automatically.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin fix/29-q40-contig-reads:fix/29-q40-contig-reads
git checkout fix/29-q40-contig-reads
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: sleepy/llama.cpp#39