Commit Graph

  • 750579ff14 common: Refactoring sampler parameters (#20429) (#22233) b8884 Ethan Turner 2026-04-22 01:40:19 -07:00
  • 134d6e54d4 common/chat, server: refactor, move all conversion functions to common, add tests (#20690) b8883 Piotr Wilkin (ilintar) 2026-04-22 10:28:45 +02:00
  • ca7f7b7b94 ggml-webgpu(shader): support conv2d kernels. (#21964) b8882 Chen Yuan 2026-04-21 23:18:57 -04:00
  • 0dedb9ef7a hexagon: add support for FILL op (#22198) b8881 Aparna M P 2026-04-22 04:54:20 +05:30
  • 2799d933b5 ggml-webgpu: reset CPU/GPU profiling time when freeing context (#22050) b8880 Masashi Yoshimura 2026-04-22 08:05:21 +09:00
  • 04fe84b69d server: allow cancel loading model (#21814) Xuan-Son Nguyen 2026-04-22 00:26:09 +02:00
  • 5a4cd6741f Hexagon: DAIG op (#22195) b8878 Shreya Jain 2026-04-21 14:16:04 -07:00
  • 2248799a58 hexagon: fix missing v79 entry in libggml-htp.inf (#22194) Mengsheng Wu 2026-04-22 04:53:44 +08:00
  • 72d693e4fb spec : reset i_last when low acceptance streak occurs (#22168) b8876 Paul Dubs 2026-04-21 20:29:07 +02:00
  • 98d2d2884e mtmd: Add support for Reka Edge 2603 (#21616) b8875 Kwa Jie Hao 2026-04-22 02:02:49 +08:00
  • 84652b80cf arg : add --spec-default (#22223) b8874 Georgi Gerganov 2026-04-21 19:52:02 +03:00
  • 52f1096f21 openvino: driver setup, CI split, thread safety, and NPU optimizations (#21944) b8873 Zijun Yu 2026-04-21 23:58:34 +08:00
  • 606fa42f5d vendor : update cpp-httplib to 0.43.1 (#22143) b8872 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-04-21 11:45:48 -03:00
  • 7fc1c4ef78 metal : workaround macOS GPU interactivity watchdog (#22216) b8871 Georgi Gerganov 2026-04-21 17:24:55 +03:00
  • 82209efb7e vulkan: Support F16 OP_FILL (#22177) b8870 Jeff Bolz 2026-04-21 11:01:56 +02:00
  • 9998d88bc8 mtmd: correct mtmd_decode_use_mrope() (#22188) b8869 Xuan-Son Nguyen 2026-04-21 10:53:37 +02:00
  • cd03ec7642 llama-ext : fix exports (#22202) b8868 Georgi Gerganov 2026-04-21 11:04:46 +03:00
  • 4889afba5f sync : ggml Georgi Gerganov 2026-04-21 11:03:42 +03:00
  • 041fe83d74 ggml : bump version to 0.10.0 (ggml/1463) Georgi Gerganov 2026-04-21 11:02:56 +03:00
  • cfe9838d26 fit-params : refactor + add option to output estimated memory per device (#22171) Georgi Gerganov 2026-04-21 09:54:36 +03:00
  • ff6b1062af server : fix hardcoded proxy connection timeout in router mode (#18760) (#22003) b8864 xris99 2026-04-21 06:41:14 +02:00
  • 97895129e5 ggml-cuda: flush legacy pool on OOM and retry (#22155) b8863 leonardHONG 2026-04-21 05:30:38 +08:00
  • 86f8daacfe mtmd: correct get_n_pos / get_decoder_pos (#22175) b8862 Xuan-Son Nguyen 2026-04-20 23:29:19 +02:00
  • cf8b0dbda9 server : remove /api endpoints (#22165) b8861 Georgi Gerganov 2026-04-20 20:41:19 +03:00
  • fd6ae4ca1c Tensor-parallel: Fix delayed AllReduce on Gemma-4 MoE (#22129) b8860 Gaurav Garg 2026-04-20 21:55:39 +05:30
  • fb19f94c71 TP: fix 0-sized tensor slices, AllReduce fallback (#21808) b8859 Johannes Gäßler 2026-04-20 18:09:39 +02:00
  • 7f251fdbce ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) (#21636) b8858 pl752 2026-04-20 21:02:54 +05:00
  • a6cc43c286 ggml-webgpu: updated matrix-vector multiplication (#21738) b8857 neha-ha 2026-04-20 07:37:17 -07:00
  • a678916623 mtmd: refactor mtmd_decode_use_mrope (#22161) Xuan-Son Nguyen 2026-04-20 14:45:11 +02:00
  • 81df3f7cfa fix: GLM-DSA crash in llama-tokenize when using vocab_only (#22102) b8855 SamareshSingh 2026-04-20 02:32:46 -05:00
  • de71b5f81c server : refactor "use checkpoint" logic (#22114) b8854 Georgi Gerganov 2026-04-20 08:42:37 +03:00
  • 788fcbc5dd [SYCL] Fix reorder MMVQ assert on unaligned vocab sizes (#22035) b8853 Katostrofik 2026-04-20 01:39:45 -04:00
  • 9d49acb2a7 server: rename --clear-idle to --cache-idle-slots (#21741) b8852 Yes You Can Have Your Own 2026-04-20 08:30:24 +03:00
  • e365e658f0 vendor : update cpp-httplib to 0.42.0 (#21781) b8851 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-04-19 19:41:43 -03:00
  • 4eac5b4509 CUDA: refactor mma data loading for AMD (#22051) b8850 Johannes Gäßler 2026-04-19 18:26:59 +02:00
  • d5b780a676 common/autoparser : allow space after tool call (#22073) b8849 Aldehir Rojas 2026-04-19 06:28:35 -05:00
  • 471540ae8a HIP: Remove unesscary NCCL_CHECK (#21914) b8848 uvos 2026-04-19 12:59:44 +02:00
  • 19124078be mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos (breaking change) (#22082) b8847 Xuan-Son Nguyen 2026-04-19 11:57:21 +02:00
  • bcdcc1044f ggml : reduce CPU overhead in meta backend (#22041) b8846 Gaurav Garg 2026-04-19 15:18:35 +05:30
  • 037bfe38d0 ci : install spirv-headers for vulkan-cross (#22109) Sigbjørn Skjæret 2026-04-19 09:32:08 +02:00
  • 8685e7b075 convert : support sentence-transformer 5.4 config files (#22087) Dowon 2026-04-19 16:25:39 +09:00
  • 09b4efa95f cmake: remove CMP0194 policy to restore MSVC builds (#21934) b8843 texasich 2026-04-19 02:25:05 -05:00
  • 455d8e4be8 server : speculative checkpointing (#19493) b8842 Sascha Rogmann 2026-04-19 09:24:06 +02:00
  • 91fef95362 rpc : refactor the RPC transport (#21998) b8841 Radoslav Gerganov 2026-04-19 10:21:53 +03:00
  • 9e5647affa server: Expose media_tag on /props endpoint. (#22028) b8840 Cetarthoriphros 2026-04-18 19:27:17 -03:00
  • 4f02d47339 model : refactor bias tensor variable names (#22079) b8839 Sigbjørn Skjæret 2026-04-18 20:12:00 +02:00
  • 23b8cc4991 android : libcommon -> libllama-common (#22076) b8838 Sigbjørn Skjæret 2026-04-18 11:19:40 +02:00
  • 59accc8863 ggml-backend-meta: add multi-segment read support in get_tensor (#22063) b8837 SamareshSingh 2026-04-18 03:04:51 -05:00
  • 83d58e02fc ci : free disk space for rocm release (#22012) b8836 Sigbjørn Skjæret 2026-04-18 09:37:30 +02:00
  • 89a5474f0e convert : fix (ignore for now) typings errors (#22002) Sigbjørn Skjæret 2026-04-18 09:36:41 +02:00
  • fd1c0ec3f0 llama: fit ctx size for CPU only (#21568) Johannes Gäßler 2026-04-18 08:16:04 +02:00
  • 45cac7ca70 ggml-webgpu: fix compiler warnings and refactor FlashAttention encoding (#21052) b8833 Reese Levine 2026-04-17 09:17:11 -07:00
  • b94050e896 CUDA: use LRU based eviction for cuda graphs (#21611) b8832 Aman Gupta 2026-04-17 23:24:21 +08:00
  • a279d0f0f4 ci : add android arm64 build and release (#21647) b8831 Yuri Khrustalev 2026-04-17 05:32:24 -04:00
  • 268d61e178 mtmd: add missing struct tag (#22023) b8830 65a 2026-04-17 01:48:33 -07:00
  • 6990e2f1f7 libs : rename libcommon -> libllama-common (#21936) b8829 Georgi Gerganov 2026-04-17 11:11:46 +03:00
  • fcc7508759 model : Gemma4 model type detection (#22027) b8828 Eric Zhang 2026-04-17 16:07:11 +08:00
  • 5e6c0e18b6 opencl: refactor q8_0 set_tensor and mul_mat host side dispatch for Adreno (#21938) b8827 lhez 2026-04-16 22:28:33 -07:00
  • 30dce2cf29 cli : use get_media_marker (#22017) b8826 Sigbjørn Skjæret 2026-04-17 00:12:31 +02:00
  • 089dd41fe3 cmake: use glob to collect src/models sources (#22005) b8825 Xuan-Son Nguyen 2026-04-16 23:25:16 +02:00
  • 85dde8dc4a hexagon: optimize HMX matmul operations (#21071) b8824 nullname 2026-04-17 04:48:34 +08:00
  • 4fbdabdc61 model: using single llm_build per arch (#21970) b8823 Xuan-Son Nguyen 2026-04-16 21:10:22 +02:00
  • e45dbdece8 opencl: add q5_K gemm and gemv kernels for Adreno (#21595) b8822 shaofeiqi 2026-04-16 12:08:33 -07:00
  • 4adac43f6f server: tests: fetch random media marker via /apply-template (#21962) (#21980) b8821 Pascal 2026-04-16 19:46:21 +02:00
  • 9db77a020c model : refactor QKV into common build_qkv and create_tensor_qkv helpers (#21245) PikaPikachu 2026-04-16 23:41:34 +08:00
  • f772f6e434 model : support NVFP4 tensors for Gemma4 (#21971) Sigbjørn Skjæret 2026-04-16 16:51:47 +02:00
  • b572d1ecd6 codeowners: add team member comments (#21714) Ruben Ortlam 2026-04-16 12:13:11 +02:00
  • 03b3d07798 Convert: Fix NemotronH Config Parsing (#21664) Anav Prasad 2026-04-16 10:11:45 +00:00
  • 3f7c29d318 ggml: add graph_reused (#21764) b8816 Aman Gupta 2026-04-16 17:21:28 +08:00
  • ae2d34899e metal: Implement ROLL op (#21946) b8815 Kusha Gharahi 2026-04-16 03:54:37 -05:00
  • 1e796eb41f ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot (#20633) b8814 rehan-10xengineer 2026-04-16 13:15:15 +05:00
  • 5637536517 ggml : implemented simd_gemm kernel for riscv vector extension (#20627) b8813 rehan-10xengineer 2026-04-16 13:14:26 +05:00
  • 90fb96a7b3 devops : added spirv-headers to nix (#21965) Yuannan 2026-04-16 08:12:52 +00:00
  • 82677a6ede ggml-webgpu: compute pass batching and removing profiling overhead (#21873) b8811 Reese Levine 2026-04-16 01:12:19 -07:00
  • 8612ed18b7 ci : Use ggml-org/ccache-action on RISC-V as well (#21632) Ludovic Henry 2026-04-16 10:11:25 +02:00
  • b1be68e8ca [SYCL] Fix Q8_0 reorder: garbage on 2nd prompt + crash on full VRAM (#21638) b8809 Katostrofik 2026-04-16 01:34:05 -04:00
  • 408225bb1a server: use random media marker (#21962) b8808 Xuan-Son Nguyen 2026-04-15 23:52:22 +02:00
  • b3d758750a vulkan: optimize im2col (#21713) b8807 Ruben Ortlam 2026-04-15 19:04:51 +02:00
  • 7e72b38bc1 cuda: Q1_0 initial backend (#21629) b8806 Pasha Khosravi 2026-04-15 09:38:38 -07:00
  • 20d3bc2cc8 ggml-webgpu: Fix dequantization helpers to not pass in pointers (#21872) Reese Levine 2026-04-15 09:14:40 -07:00
  • a6206958d2 CUDA: require explicit opt-in for P2P access (#21910) b8804 Johannes Gäßler 2026-04-15 16:01:46 +02:00
  • 014dca49d6 CUDA: manage NCCL communicators in context (#21891) Johannes Gäßler 2026-04-15 15:58:40 +02:00
  • adb541a6ad rpc : add native RDMA transport for RPC backend (RoCEv2) (#20590) b8802 Valeriy Dubov 2026-04-15 16:44:02 +03:00
  • 80d8770804 docs: more extensive RoPE documentation [no ci] (#21953) Xuan-Son Nguyen 2026-04-15 14:45:16 +02:00
  • 8dc530b86d ci: disable test-backend-ops on Vulkan llvmpipe run and resture default timeout (#21901) Ruben Ortlam 2026-04-15 10:55:21 +02:00
  • e1a9a6dcbe autoparser: support case of JSON_NATIVE with per-call markers (test case: Reka-Edge) (#21892) b8799 Piotr Wilkin (ilintar) 2026-04-15 10:51:50 +02:00
  • e39eba26f3 read n_ctx back after making llama_context (#21939) b8798 Matt 2026-04-15 00:24:57 -07:00
  • 5d14e5d19b hexagon: optimization for HMX mat_mul (#21554) b8797 Yiwei Shao 2026-04-14 14:09:03 -07:00
  • fae3a28070 ggml : remove ggml-ext.h (#21869) b8796 Xuan-Son Nguyen 2026-04-14 16:32:58 +02:00
  • c0de6eda72 metal : fix FA support logic (#21898) b8795 Georgi Gerganov 2026-04-14 17:32:29 +03:00
  • 707c0b7a6e mtmd: add mtmd_image_tokens_get_decoder_pos() API (#21851) b8794 Xuan-Son Nguyen 2026-04-14 16:07:41 +02:00
  • 1f30ac0cea vulkan: Programmatically add RoundingModeRTE to all shaders when the device supports it (#21572) b8793 Jeff Bolz 2026-04-14 15:17:45 +02:00
  • f4b5bf2f32 ci : re-enable mac workflows (#21894) b8792 Georgi Gerganov 2026-04-14 15:58:09 +03:00
  • aa0f1897b7 metal : add XIELU unary op (#20802) b8791 Seyoung Jeong 2026-04-14 21:43:59 +09:00
  • be76dd0bb2 vendor : update BoringSSL to 0.20260413.0 (#21881) b8790 Adrien Gallouët 2026-04-14 13:25:09 +02:00
  • 2e05f06ffb ggml : fix ARM NEON nvfp4 dot product on non-dotprod targets (#21559) b8789 Richard Davison 2026-04-14 13:23:45 +02:00
  • acc37a42ea cmake: fix CMP0194 warning on Windows with MSVC (#21630) b8788 texasich 2026-04-14 05:47:56 -05:00
  • 5a23695d5a ggml-webgpu: Update register tiling matmul to use f32 accumulation (#21644) b8787 Reese Levine 2026-04-14 03:46:41 -07:00
  • 56666fa607 common: skip reasoning budget sampler when no budget is requested (#21870) b8786 Berk Idem 2026-04-14 06:43:06 -04:00
  • 6a6780a232 vulkan: Support GGML_TYPE_NVFP4 (#21455) b8785 Jeff Bolz 2026-04-14 11:34:23 +02:00