Commit Graph

  • cf21cdf36c kleidiai: add data type check to get_tensor_traits (#20639) Martin Klacer 2026-03-16 19:25:54 +00:00
  • 0ed992973b ci : update labeler (#20629) Sigbjørn Skjæret 2026-03-16 20:24:20 +01:00
  • 1bbec6a75d jinja : add capability check for object args (#20612) Aldehir Rojas 2026-03-16 11:43:14 -05:00
  • f47a246a08 sync : ggml Georgi Gerganov 2026-03-16 14:56:06 +02:00
  • c0ccbd1f86 ggml : try fix arm build (whisper/0) Georgi Gerganov 2026-03-16 09:11:13 +02:00
  • f6da02c3f2 ggml : extend im2col f16 (ggml/1434) David366AI 2026-03-15 15:50:56 -04:00
  • dddca026bf webui: add model information dialog to router mode (#20600) Pascal 2026-03-16 15:38:11 +01:00
  • 3c8521c4f5 llama-graph: replace cont with reshape for alpha in qwen35 (#20640) b8377 Aman Gupta 2026-03-16 22:07:13 +08:00
  • 67a2209fab webui: Add MCP CORS Proxy detection logic & UI (#20167) Aleksander Grygier 2026-03-16 13:05:36 +01:00
  • d65c4f2dc9 Fix model selector locked to first loaded model with multiple models (#20580) Pascal 2026-03-16 12:04:06 +01:00
  • d8c331c0af webui: use date in more human readable exported filename (#19939) Woof Dog 2026-03-16 10:18:13 +00:00
  • 46dba9fce8 vulkan: fix flash attention dot product precision (#20589) b8373 Ruben Ortlam 2026-03-16 10:45:49 +01:00
  • de8f01c2d7 model : wire up Nemotron-H tensors for NVFP4 support (#20561) b8372 Sigbjørn Skjæret 2026-03-16 09:19:16 +01:00
  • 079e5a45f0 convert : support mixed-precision ModelOpt models with per-tensor NVFP4/FP8 quantization (#20539) Richard Davison 2026-03-16 09:18:47 +01:00
  • d3936498a3 common : fix iterator::end() dereference (#20445) b8370 Masato Nakasaka 2026-03-15 23:50:38 -07:00
  • 34818ea6c0 CUDA: GDN hide memory latency (#20537) b8369 Aman Gupta 2026-03-16 11:41:45 +08:00
  • 9e2e2198b0 tools/cli: fix disable reasoning (#20606) b8368 Piotr Wilkin (ilintar) 2026-03-15 22:40:53 +01:00
  • 88915cb55c server : fix wait in test_cancel_requests() test (#20601) Georgi Gerganov 2026-03-15 20:54:37 +02:00
  • ebbf544ed1 sycl : fix for untransposed GDA recurrent state (#20583) b8366 Sigbjørn Skjæret 2026-03-15 19:10:15 +01:00
  • b91d7dfe5b ci : only save openvino caches on github-hosted master (#20593) Sigbjørn Skjæret 2026-03-15 18:58:13 +01:00
  • ae40cd27c8 CUDA: limit number of FA stream-k CUDA blocks (#20586) b8364 Johannes Gäßler 2026-03-15 18:30:47 +01:00
  • ceef6b5233 ggml: avoid creating CUDA context during device init (#20595) b8363 Pascal 2026-03-15 17:42:56 +01:00
  • 07c6a59b4f vendor : update cpp-httplib to 0.38.0 (#20578) b8362 Adrien Gallouët 2026-03-15 17:30:06 +01:00
  • 8b7d340b6f ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain (#20536) b8361 MoonShadow 2026-03-16 00:23:58 +08:00
  • 559646472d fix: prevent nullptr dereference (#20552) b8360 Eric Hsieh 2026-03-15 23:51:49 +08:00
  • cf45437d35 codeowners : use teams (#20526) Sigbjørn Skjæret 2026-03-15 14:26:10 +01:00
  • 9cd4ebcfb1 ci : split build.yml + server.yml (#20546) b8358 Georgi Gerganov 2026-03-15 15:11:17 +02:00
  • 89d0aec042 convert : support contiguous method on lora tensors (#20489) Sigbjørn Skjæret 2026-03-15 12:15:12 +01:00
  • b9da4444df ggml : guard against sumq2 being 0 in IQ4_NL (#20460) b8356 Bartowski 2026-03-15 04:47:28 -04:00
  • 617db241aa cuda : add RDNA4-specific MMVQ parameter table for bs=1 decode (#19478) b8355 PikaPikachu 2026-03-15 15:33:39 +08:00
  • 1a3d8edbba vulkan: use graphics queue on AMD (#20551) b8354 Ruben Ortlam 2026-03-15 08:18:54 +01:00
  • 6b10a82c00 kv-cache : fix reading llama_kv_cell_ext during state read (#20273) b8353 sprayandwipe 2026-03-15 07:11:19 +00:00
  • d23355afc3 model : wire up Qwen3.5/Qwen3.5MoE tensors for NVFP4 support (#20506) b8352 Michael Wand 2026-03-14 14:44:42 -07:00
  • b30a5fdf37 metal : add FA specialization for HSK = 320, HSV = 256 (#20549) b8351 Georgi Gerganov 2026-03-14 23:15:47 +02:00
  • b4768955c4 ci : move self-hosted workflows to separate files (#20540) b8350 Georgi Gerganov 2026-03-14 23:15:35 +02:00
  • fc350fdf96 docker : force Python 3.13 in Vulkan container (#20530) Gerard Guillemas Martos 2026-03-14 21:37:09 +01:00
  • 3a6f059909 ci : try to optimize some jobs (#20521) b8348 Eve 2026-03-14 19:27:52 +00:00
  • 609ea50026 hexagon: Q4_0 and MXFP4 repack fixes (#20527) b8347 Max Krasnyansky 2026-03-14 11:09:08 -07:00
  • 9f774e45ee ci : reduce webgpu tests timeout to 900s (#20538) Georgi Gerganov 2026-03-14 17:08:26 +02:00
  • 94d0262277 mtmd: add llama-mtmd-debug binary (#20508) Xuan-Son Nguyen 2026-03-14 15:52:29 +01:00
  • a93c0ef0fa add op gated_delta_net (#20455) Neo Zhang 2026-03-14 22:01:57 +08:00
  • 710878a7dd webui: restore code preview iframe origin isolation (#20477) Chedrian07 2026-03-14 19:28:28 +09:00
  • 0685848bc6 scripts : remove get-wikitext-103.sh (#20543) Adrien Gallouët 2026-03-14 11:22:04 +01:00
  • 0024a69b70 scripts : update get-hellaswag.sh and get-winogrande.sh (#20542) Adrien Gallouët 2026-03-14 11:21:50 +01:00
  • d0b79aaa2f ggml : add native AVX512-FP16 support for F16 operations (#20529) b8340 Adrien Gallouët 2026-03-14 10:06:14 +01:00
  • f2c0dfb739 Use fp32 in cuBLAS V100 to avoid overflows, env variables to override cuBLAS compute type (#19959) b8339 Wallentri 2026-03-14 10:43:13 +03:00
  • 9789c4ecdc ggml : add OpenVINO backend (#15307) b8338 Zijun Yu 2026-03-14 13:56:55 +08:00
  • 77e20cc107 vendor : update cpp-httplib to 0.37.2 (#20484) b8337 Adrien Gallouët 2026-03-14 06:51:02 +01:00
  • 5a32a9b8a5 Fix data race in CUDA's "cpy" kernel (influences GGML's DUP, CONT operations). (#20507) b8336 Rail Chabdarov 2026-03-14 06:19:44 +01:00
  • 3b439504ba opencl: fix l2_norm (#20480) lhez 2026-03-13 22:18:52 -07:00
  • 463b6a963c tools : enable kvu in perplexity for hellaswag, winogrande, multiple-choice (#19954) b8334 Adrien Gallouët 2026-03-13 21:25:57 +01:00
  • e30f1fdf74 graph : remove redundant GDN state transposes (#20443) b8333 Georgi Gerganov 2026-03-13 22:12:54 +02:00
  • 1430c35948 common/parser: gracefully handle undetected tool parser, print error message. (#20286) b8332 Piotr Wilkin (ilintar) 2026-03-13 20:56:10 +01:00
  • f17b3be63f llama : fix pooling assertion crash in chunked GDN detection path (#20468) b8331 ZeroV0LT 2026-03-13 19:53:42 +01:00
  • d7ba99c485 server: reset counter related to kill-switch on client error (#20513) b8330 SoftwareRenderer 2026-03-13 13:58:09 -04:00
  • fbaa95bc29 ggml-cpu: add RVV vec dot kernels for quantization types (#18859) b8329 rehan-10xengineer 2026-03-13 20:36:04 +05:00
  • b5e1212063 ggml : fix typo gmml (#20512) b8328 Adrien Gallouët 2026-03-13 14:36:13 +01:00
  • 8f974d2392 mtmd : rename mtmd_get_audio_bitrate to mtmd_get_audio_sample_rate (#20105) b8327 Daniel Bevenius 2026-03-13 12:30:02 +01:00
  • 2948e6049a general: CONTRIBUTING.md - guidelines for quantization schemes (#19762) Piotr Wilkin (ilintar) 2026-03-13 12:21:33 +01:00
  • 73c9eb8ced metal : fix l2 norm scale (#20493) b8325 Georgi Gerganov 2026-03-13 11:43:20 +02:00
  • 983df142a9 convert : fix/suppress pyright errors (#20442) Daniel Bevenius 2026-03-13 06:00:52 +01:00
  • 57819b8d4b llama : disable graph reuse with pipeline parallelism (#20463) b8323 Georgi Gerganov 2026-03-12 21:04:13 +02:00
  • 557fe2d913 vendor : update cpp-httplib to 0.37.1 (#20390) b8322 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-03-12 09:57:06 -03:00
  • 0e810413bb tests : use reasoning instead of reasoning_budget in server tests (#20432) Piotr Wilkin (ilintar) 2026-03-12 13:41:01 +01:00
  • 128142fe7d test-backend-ops: allow loading tests from file and parsing model operators into file (#19896) b8320 Ruben Ortlam 2026-03-12 13:26:00 +01:00
  • 6de1bc631d common : update completion executables list [no ci] (#19934) Daniel Bevenius 2026-03-12 12:12:01 +01:00
  • 0a10c34dc1 grammar: Fix grammar root symbol check (#19761) b8318 Asbjørn Olling 2026-03-12 12:04:56 +01:00
  • deee23863b vulkan: add GATED_DELTA_NET op support (#20334) b8317 ProgenyAlpha 2026-03-12 06:32:04 -04:00
  • c3e3f9e533 convert : better mtp check and fix return [no ci] (#20419) Sigbjørn Skjæret 2026-03-12 10:04:20 +01:00
  • 40c550d4f6 vulkan: fix SSM_CONV PP scaling with large ubatch sizes (#20379) b8315 ProgenyAlpha 2026-03-12 05:03:18 -04:00
  • de190154c8 New conversations now auto-select the first loaded model (#20403) Pascal 2026-03-12 09:07:05 +01:00
  • 05039967da ggml-virtgpu: Fix some build commands (#20341) Masashi Yoshimura 2026-03-12 16:47:45 +09:00
  • e4cff0956b metal : avoid divisions in bin kernel (#20426) b8312 Georgi Gerganov 2026-03-12 09:42:40 +02:00
  • 4cc6eb158c ci: Setup self-hosted CI for Intel Linux Vulkan backend (#20154) Masato Nakasaka 2026-03-11 22:43:22 -07:00
  • 246ffc4b05 vulkan: fix l2_norm epsilon handling (#20350) b8310 Jeff Bolz 2026-03-12 00:39:41 -05:00
  • aa429cf507 vulkan: fix OOB check in flash_attn_mask_opt (#20296) b8309 Jeff Bolz 2026-03-12 00:35:49 -05:00
  • 5866e3bbc8 vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap (#20059) b8308 Masato Nakasaka 2026-03-11 22:30:16 -07:00
  • 0516e04bf9 opencl: use larger workgroup size for get_rows (#20316) lhez 2026-03-11 22:03:27 -07:00
  • 3d9ab225e7 opencl: add cumsum op (#18981) shaofeiqi 2026-03-11 22:03:07 -07:00
  • d63aa398de hip: compile debug builds with -O2 on hip to avoid a compiler bug (#20392) b8305 uvos 2026-03-12 03:37:10 +01:00
  • a8304b4d27 common/parser: add GigaChatV3/3.1 models support (#19931) b8304 Mishusha 2026-03-12 03:22:25 +03:00
  • fdb17643d3 model : add support for Phi4ForCausalLMV (#20168) b8303 DAN™ 2026-03-11 19:25:54 -04:00
  • 1eea6a2968 graph : add optional scale parameter to build_lora_mm [no ci] (#20427) Richard Davison 2026-03-12 00:22:49 +01:00
  • 4a748b8f15 common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (#20416) b8301 ddh0 2026-03-11 18:13:28 -05:00
  • f2ab047f27 ggml-webgpu: Add supports for GGML_OP_REPEAT (#20230) b8300 Masashi Yoshimura 2026-03-12 06:40:36 +09:00
  • d28961d81e llama : enable chunked fused GDN path (#20340) b8299 Georgi Gerganov 2026-03-11 22:46:40 +02:00
  • f90bd1dd84 llama : whitespace cleanup (#20422) b8298 Sigbjørn Skjæret 2026-03-11 21:18:29 +01:00
  • 5eae9cb1d9 ggml : add NVFP4 quantization type support (#19769) b8297 Richard Davison 2026-03-11 21:02:54 +01:00
  • 3ca19b0e9f benches : add nemotron super (#20420) Georgi Gerganov 2026-03-11 21:39:40 +02:00
  • eaf1d7930c llama : add support for Nemotron 3 Super (#20411) b8295 Daniel Bevenius 2026-03-11 19:27:53 +01:00
  • 76ea1c1c46 metal : fix capture_compute counter logic (#20410) Georgi Gerganov 2026-03-11 18:38:22 +02:00
  • bd1ec818e9 compare-llama-bench: check remotes as well (#20406) Aman Gupta 2026-03-12 00:14:42 +08:00
  • b541241104 metal : fix q5_k mul_mv register spill (#20399) b8292 Georgi Gerganov 2026-03-11 16:25:27 +02:00
  • c363256839 metal : add env var to trigger graph capture (#20398) b8291 Georgi Gerganov 2026-03-11 16:25:10 +02:00
  • ecac98ee53 [SYCL] Update SYCL.md for binary package for Windows (#20401) Neo Zhang 2026-03-11 22:21:22 +08:00
  • 182acfe5c5 ci: disable coopmat on ubuntu-24-cmake-vulkan job (#20294) Ruben Ortlam 2026-03-11 14:12:29 +01:00
  • b5fe4559ae common/parser: use nlohmann::ordered_json to preserve parameter order (#20385) Aldehir Rojas 2026-03-11 04:26:51 -05:00
  • acb7c79069 common/parser: handle reasoning budget (#20297) b8287 Piotr Wilkin (ilintar) 2026-03-11 10:26:12 +01:00
  • 5f91b1d5d5 ggml-cuda: gdn use shared mem for HIP (#20366) b8286 uvos 2026-03-11 06:06:19 +01:00
  • 9ef7523ee9 cuda/hip: fix loop unrolling in ssm-conv (#20369) b8285 uvos 2026-03-11 06:04:32 +01:00