llama.cpp

Files

T

Aman Gupta c8a2417d7b CUDA: experimental native mxfp4 support for blackwell (#17906 )

* CUDA: experimental native mxfp4 support for blackwell

* optimize load_tiles

* optimize quantize_mxfp4

* cleanup

* first pass review: formatting

* use interleaved layout for mma

* mmq: add assert for size

* use __nv_fp4x4_e2m1

* use iter_k as 512, cleanup

* Use 1200 as blackwell instead of 1000

* address review comments

* mmq: fix stride

* quantize.cu: use reference impl of e8m0 scale

* address review comments

* add 120f-virtual + minor fixes

---------

Co-authored-by: Aman Gupta <aman>

2025-12-24 22:28:26 +08:00

cmake

ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )

2025-08-07 13:45:41 +02:00

include

llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653 )

2025-12-15 09:24:59 +01:00

src

CUDA: experimental native mxfp4 support for blackwell (#17906 )

2025-12-24 22:28:26 +08:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations (#17977 )

2025-12-19 09:42:28 -08:00