llama.cpp

Files

T

Developer-Ecosystem-Engineering d1649047a3 metal : optimize Metal Tensor API usage for GGML_OP_MUL_MAT (#20962 )

* Optimize Metal Tensor API usage for matmul2d

Separates the Metal Tensor API (matmul2d) path in kernel_mul_mm into its own standalone kernel, gated by GGML_METAL_HAS_TENSOR.

The legacy simdgroup_matrix kernel is preserved under #else.

Previously both paths were interleaved via #ifdef blocks within a single kernel, forcing the tensor path to share the legacy kernel's data layout and threadgroup memory scheme. Splitting the kernel enabled memory and dispatch optimizations that weren't possible when the two paths shared code structure.

* cont : cleanup

* cont : cleanup

* cont : cleanup

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2026-04-25 15:14:28 +03:00

cmake

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

include

CUDA: manage NCCL communicators in context (#21891 )

2026-04-15 15:58:40 +02:00

src

metal : optimize Metal Tensor API usage for GGML_OP_MUL_MAT (#20962 )

2026-04-25 15:14:28 +03:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

HIP: flip GGML_HIP_GRAPHS to default on (#22254 )

2026-04-23 02:34:31 +02:00