llama.cpp

Files

T

Jeff Bolz e06c3ab2bc vulkan: change gated_delta_net to shard a column across a subgroup (#20662 )

* vulkan: change gated_delta_net to shard a column across a subgroup

This is based on https://github.com/ggml-org/llama.cpp/pull/20391, I used an
LLM to port the CUDA code to Vulkan, and guided to it to make various fixes to
work with Vulkan (e.g. handling different subgroup sizes, unknown mapping of
subgroup to invocation id, using subgroupAdd optionally, etc.).

This fixes a perf regression from the transposing of the values in memory
(!20443).

* vulkan: Spread columns across fewer lanes to reduce the number of workgroups

2026-03-20 12:17:15 +01:00

cmake

ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )

2025-08-07 13:45:41 +02:00

include

ggml : restore ggml_type_sizef() to aboid major version bump (ggml/1441)

2026-03-18 15:17:28 +02:00

src

vulkan: change gated_delta_net to shard a column across a subgroup (#20662 )

2026-03-20 12:17:15 +01:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : bump version to 0.9.8 (ggml/1442)

2026-03-18 15:17:28 +02:00