llama.cpp

Files

T

Alfred ce734a8a2f ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations (#17977 )

* feat: implement real Q8_0

* feat: adding cmake option for configuring FP32 quantize group size

* typo: set() shall be used

---------

Co-authored-by: ngdxzy <zhenyu_xu@uri.edu>

2025-12-19 09:42:28 -08:00

android

android: fix missing screenshots for Android.md (#18156 )

2025-12-19 09:32:04 +02:00

backend

ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations (#17977 )

2025-12-19 09:42:28 -08:00

development

arch: refactor LLM_TENSOR_NAMES (#18051 )

2025-12-16 13:22:30 +01:00

multimodal

model : support MiniCPM-V 4.5 (#15575 )

2025-08-26 10:05:55 +02:00

ops

[SYCL] Support gpt-oss by OPs add-id, mul_mat for mxfp4, swiglu_oai (#17826 )

2025-12-15 10:35:15 +08:00

android.md

android: fix missing screenshots for Android.md (#18156 )

2025-12-19 09:32:04 +02:00

build-riscv64-spacemit.md

ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support (#17784 )

2025-12-08 10:41:34 +02:00

build-s390x.md

ggml-zdnn: fix #15414 , activate FP16 and BF16 acceleration and incorrect zTensor free (#15839 )

2025-09-13 02:39:52 +08:00

build.md

ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690 )

2025-12-07 00:13:33 +08:00

docker.md

CLI: fixed adding cli and completion into docker containers, improved docs (#18003 )

2025-12-16 11:52:23 +01:00

function-calling.md

server : add documentation for parallel_tool_calls param (#15647 )

2025-08-29 20:25:40 +03:00

install.md

docs : add "Quick start" section for new users (#13862 )

2025-06-03 13:09:36 +02:00

llguidance.md

llguidance build fixes for Windows (#11664 )

2025-02-14 12:46:08 -08:00

multimodal.md

mtmd : add support for Voxtral (#14862 )

2025-07-28 15:01:48 +02:00

ops.md

[SYCL] Support gpt-oss by OPs add-id, mul_mat for mxfp4, swiglu_oai (#17826 )

2025-12-15 10:35:15 +08:00