llama.cpp

Author	SHA1	Message	Date
Kaloyan Nikolov	eeb79b026b	[metal] extend bin op fusion to MUL/SUB/DIV chains (#28 ) CI (apple) / macOS-latest-ios (pull_request) Waiting to run Details CI (apple) / macos-latest-ios-xcode (pull_request) Waiting to run Details CI (apple) / macOS-latest-tvos (pull_request) Waiting to run Details CI (apple) / macOS-latest-visionos (pull_request) Waiting to run Details CI (apple) / macOS-latest-swift (generic/platform=iOS) (pull_request) Blocked by required conditions Details CI (apple) / macOS-latest-swift (generic/platform=macOS) (pull_request) Blocked by required conditions Details CI (apple) / macOS-latest-swift (generic/platform=tvOS) (pull_request) Blocked by required conditions Details CI (self-hosted) / ggml-ci-nvidia-cuda (pull_request) Waiting to run Details CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (pull_request) Waiting to run Details CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (pull_request) Waiting to run Details CI (self-hosted) / ggml-ci-mac-metal (pull_request) Waiting to run Details CI (self-hosted) / ggml-ci-mac-webgpu (pull_request) Waiting to run Details CI (self-hosted) / ggml-ci-mac-vulkan (pull_request) Waiting to run Details CI (self-hosted) / ggml-ci-linux-intel-vulkan (pull_request) Waiting to run Details CI (self-hosted) / ggml-ci-win-intel-vulkan (pull_request) Waiting to run Details CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (pull_request) Waiting to run Details CI / build-cmake-pkg (pull_request) Waiting to run Details CI / macOS-latest-arm64 (pull_request) Waiting to run Details CI / macOS-latest-x64 (pull_request) Waiting to run Details CI / macOS-latest-arm64-webgpu (pull_request) Waiting to run Details CI / ubuntu-cpu (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run Details CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (pull_request) Waiting to run Details CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (pull_request) Waiting to run Details CI / ubuntu-cpu (x64, ubuntu-22.04) (pull_request) Waiting to run Details CI / android-arm64 (pull_request) Waiting to run Details CI / ubuntu-latest-rpc (pull_request) Waiting to run Details CI / ubuntu-24-vulkan (arm64, ubuntu-24.04-arm) (pull_request) Waiting to run Details CI / ubuntu-24-vulkan (x64, ubuntu-24.04) (pull_request) Waiting to run Details CI / ubuntu-24-webgpu (pull_request) Waiting to run Details CI / ubuntu-24-webgpu-wasm (pull_request) Waiting to run Details CI / ubuntu-22-hip (pull_request) Waiting to run Details CI / ubuntu-22-musa (pull_request) Waiting to run Details CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (pull_request) Waiting to run Details CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (pull_request) Waiting to run Details CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (pull_request) Waiting to run Details CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (pull_request) Waiting to run Details CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (pull_request) Waiting to run Details CI / ubuntu-latest-cuda (pull_request) Waiting to run Details CI / windows-2022-cuda (12.4) (pull_request) Waiting to run Details CI / windows-latest-hip (pull_request) Waiting to run Details CI / ubuntu-cpu-riscv64-native (pull_request) Waiting to run Details CI / ggml-ci-x64-cpu-low-perf (pull_request) Waiting to run Details CI / ggml-ci-arm64-cpu-low-perf (pull_request) Waiting to run Details CI / ggml-ci-x64-cpu-high-perf (pull_request) Waiting to run Details CI / ggml-ci-arm64-cpu-high-perf (pull_request) Waiting to run Details CI / ggml-ci-arm64-cpu-high-perf-sve (pull_request) Waiting to run Details CI / ggml-ci-arm64-cpu-kleidiai (pull_request) Waiting to run Details CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (pull_request) Waiting to run Details EditorConfig Checker / editorconfig (pull_request) Waiting to run Details Server / server (default) (pull_request) Waiting to run Details Server / server (backend-sampling) (pull_request) Waiting to run Details Server / server-windows (pull_request) Waiting to run Details Pull Request Labeler / labeler (pull_request_target) Waiting to run Details	2026-04-30 20:14:12 +02:00
Georgi Gerganov	1725e316c1	models : optimize qwen3next graph (#19375 ) * models : optimizing qwen3next graph * cont * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * cont : remove redundant q, g chunking * minor * minor * avoid passing masks around * avoid concats during chunking * naming + shapes * update names and use prefix to disable CUDA graphs	2026-02-14 12:57:36 +02:00
Georgi Gerganov	0644baefde	metal : improve concurrency (#19555 )	2026-02-13 07:35:57 +02:00
Georgi Gerganov	606a73f531	metal : fix loop bound in ggml_mem_ranges (#16412 )	2025-10-03 19:18:56 +03:00
Georgi Gerganov	dfcd53f7ec	metal : fuse NORM + MUL + ADD, support non-multiples of 4 (#16220 ) * metal : fuse NORM + MUL + ADD * metal : support norms of non-multiple of 4 * cont : fix comment [no ci]	2025-09-25 11:30:16 +03:00
Georgi Gerganov	4ea00794b8	metal : relax reorder conditions (#16216 )	2025-09-25 11:29:42 +03:00
Georgi Gerganov	a71ae3ba7a	ggml : add ggml_op_is_empty (#16122 ) * ggml : add ggml_op_is_empty * ggml : move to ggml-impl.h	2025-09-22 11:12:09 +03:00
Georgi Gerganov	0320ac5264	metal : refactor + optimize v2 (#15995 ) * metal : improve naming * metal : refactor device ggml-ci * cont : props ggml-ci * metal : apply ggml_mem_ranges_t ggml-ci * metal : remove GGML_METAL_USE_BF16 ggml-ci * metal : refactor device buffer ggml-ci * cont : fix naming * metal : sync before destroying the backend ggml-ci * metal : refactor context ggml-ci * metal : migrate ggml-metal.m to ggml-metal.cpp ggml-ci * metal : adjust ops API ggml-ci * metal : use C++ to store piplienes ggml-ci * metal : migrate ops to separate functions ggml-ci * metal : add ggml_metal_library_t ggml-ci * metal : improve naming ggml-ci * metal : cleanp ggml-ci * metal : add support for GGML_OP_LOG ggml-ci * metal : fix error handling ggml-ci	2025-09-17 20:38:12 +03:00
Georgi Gerganov	9dcd200d57	metal : remove memory pools (#15966 ) * metal : remove mem pool usage ggml-ci * metal : remove mem pool implementation ggml-ci * metal : take into account the actual allocated memory of the tensor ggml-ci * cont : use ggml_backend_buft_get_alloc_size ggml-ci * cont : improve, comments ggml-ci * cont : add functions for the extra tensor sizes * metal : add comments ggml-ci * metal : implement .get_alloc_size for the rest of the buffer types ggml-ci * metal : remove ggml_metal_heap ggml-ci	2025-09-14 22:02:32 +03:00
Georgi Gerganov	f161463a54	metal : allow ops to run concurrently (#15929 ) * metal : run graphs ops concurrently ggml-ci * cont : add flags for debugging and disabling concurrency ggml-ci * cont : refactor and handle fusing ggml-ci * cont : simplify - no need to use GPU address ggml-ci * cont : prepare mem ranges for reuse + add ggml-metal-common.cpp ggml-ci * cont : avoid redundant keywords in cpp [no ci] * metal : reorder graph for better concurrency ggml-ci * metal : fix race on mem pool buffers ggml-ci * cont : add env GGML_METAL_GRAPH_OPTIMIZE_DISABLE ggml-ci * cont : refactor, optimize, add comments ggml-ci * cont : refactor ggml-metal.m ggml-ci * minor : update logs [no ci]	2025-09-13 13:54:28 +03:00

10 Commits