llama.cpp

Files

T

Akarshan Biswas b54cb2e3d0 sycl : add flash-attn support for head size 512 (#21654 )

* sycl : add flash-attn support for head size 512

This patch extends the SYCL Flash Attention implementation to support head sizes (DKQ/DV) of 512.

Changes:
- Added DKQ/DV 512 cases to both tile and vector Flash Attention kernels.
- Updated kernel selection logic to allow vector kernels for head sizes up to 512 (previously 256).
- Removed unused/redundant AMD and RDNA-specific configuration functions in `fattn-tile.hpp`.
- Refactored `ggml_backend_sycl_buffer_init_tensor` to use a switch statement for clearer tensor extra buffer initialization.
- Added necessary template instances for the new 512 head size across various quantization types.

* remove defunct mxfp4 reorder from setting buffer type

2026-04-09 09:36:48 +03:00

cmake

ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )

2025-08-07 13:45:41 +02:00

include

ggml : deprecate GGML_OP_ADD1 (#21363 )

2026-04-07 15:28:27 +03:00

src

sycl : add flash-attn support for head size 512 (#21654 )

2026-04-09 09:36:48 +03:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : bump version to 0.9.11 (ggml/1456)

2026-04-02 10:39:00 +03:00