llama.cpp

Files

T

ProgenyAlpha 40c550d4f6 vulkan: fix SSM_CONV PP scaling with large ubatch sizes (#20379 )

* vulkan: optimize SSM_CONV workgroup dispatch for large ubatch

Tile tokens into 2D workgroups (32x16) to reduce workgroup launch
overhead at large ubatch sizes. Add vec4 fast path for nc=4 (common
d_conv size). Fixes PP performance degradation with ubatch > 512.

Ref: ggml-org/llama.cpp#18725

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* vulkan: remove unused shared memory declaration in SSM_CONV

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Progeny Alpha <ProgenyAlpha@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-12 10:03:18 +01:00

cmake

ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )

2025-08-07 13:45:41 +02:00

include

llama : enable chunked fused GDN path (#20340 )

2026-03-11 22:46:40 +02:00

src

vulkan: fix SSM_CONV PP scaling with large ubatch sizes (#20379 )

2026-03-12 10:03:18 +01:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : bump version to 0.9.7 (ggml/1425)

2026-02-15 22:24:29 +02:00