llama.cpp/ggml/src at a6cc43c286a2ebc429aa69b9a4d16de082cedb51 - llama.cpp - Sleepy Git

sleepy/llama.cpp

Files

T

History

neha-ha a6cc43c286 ggml-webgpu: updated matrix-vector multiplication (#21738 )

* merged properly, but slow q3_k and q5_k with u32 indexing

* Start on new mat-vec

* New format float paths working

* Working q4_0

* Work on remaining legacy q-types

* port k-quants to new matvec

* remove old shader

* Remove old constants, format

* remove accidental file

---------

Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local>
Co-authored-by: Reese Levine <reeselevine1@gmail.com>

2026-04-20 07:37:17 -07:00

..

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot (#20633 )

2026-04-16 11:15:15 +03:00

CUDA: refactor mma data loading for AMD (#22051 )

2026-04-19 18:26:59 +02:00

hexagon: optimize HMX matmul operations (#21071 )

2026-04-16 13:48:34 -07:00

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

metal: Implement ROLL op (#21946 )

2026-04-16 11:54:37 +03:00

ggml-cuda: native bf16 flash attention for vec kernel (#20525 )

2026-03-22 11:05:51 +01:00

opencl: refactor q8_0 set_tensor and mul_mat host side dispatch for Adreno (#21938 )

2026-04-16 22:28:33 -07:00

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

rpc : refactor the RPC transport (#21998 )

2026-04-19 10:21:53 +03:00

[SYCL] Fix reorder MMVQ assert on unaligned vocab sizes (#22035 )

2026-04-20 08:39:45 +03:00

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

vulkan: optimize im2col (#21713 )

2026-04-15 19:04:51 +02:00

ggml-webgpu: updated matrix-vector multiplication (#21738 )

2026-04-20 07:37:17 -07:00

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

CMakeLists.txt

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

ggml-alloc.c

ggml : remove ggml-ext.h (#21869 )

2026-04-14 17:32:58 +03:00

ggml-backend-dl.cpp

hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150 )

2026-01-29 12:33:21 -08:00

ggml-backend-dl.h

hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150 )

2026-01-29 12:33:21 -08:00

ggml-backend-impl.h

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

ggml-backend-meta.cpp

ggml : reduce CPU overhead in meta backend (#22041 )

2026-04-19 12:48:35 +03:00

ggml-backend-reg.cpp

ggml : add OpenVINO backend (#15307 )

2026-03-14 07:56:55 +02:00

ggml-backend.cpp

ggml: add graph_reused (#21764 )

2026-04-16 17:21:28 +08:00

ggml-common.h

ggml: add Q1_0 1-bit quantization support (CPU) (#21273 )

2026-04-06 20:55:21 +02:00

ggml-impl.h

ggml: add graph_reused (#21764 )

2026-04-16 17:21:28 +08:00

ggml-opt.cpp

fix: free ctx_copy in ggml_opt_free to plug per-training-session leak (#21592 )

2026-04-08 17:40:15 +02:00

ggml-quants.c

ggml: add Q1_0 1-bit quantization support (CPU) (#21273 )

2026-04-06 20:55:21 +02:00

ggml-quants.h

ggml: add Q1_0 1-bit quantization support (CPU) (#21273 )

2026-04-06 20:55:21 +02:00

ggml-threading.cpp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

ggml-threading.h

remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797 )

2024-12-12 19:02:49 +01:00

ggml.c

ggml: add graph_reused (#21764 )

2026-04-16 17:21:28 +08:00

ggml.cpp

ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)

2025-06-01 13:43:57 +03:00

gguf.cpp

llama: fix llama-model-saver (#20503 )

2026-03-25 12:53:16 +02:00