Commit Graph

  • e489a5ca0e server: support OAI /v1/audio/transcriptions API (#21863) b8784 Xuan-Son Nguyen 2026-04-14 11:09:52 +02:00
  • e21cdc11a0 common/gemma4 : handle parsing edge cases (#21760) b8783 Aldehir Rojas 2026-04-13 18:18:18 -05:00
  • e974923698 docs: listing qwen3-asr and qwen3-omni as supported (#21857) Xuan-Son Nguyen 2026-04-13 22:28:17 +02:00
  • 1c0d9081fd chat: dedicated DeepSeek v3.2 parser + "official" template (#21785) b8781 Piotr Wilkin (ilintar) 2026-04-13 22:23:53 +02:00
  • a8bad3842e ci: Also exempt 'security' tag from auto-close (#21844) Christian Kastner 2026-04-13 19:18:44 +02:00
  • 75f3bc94e6 vulkan: Flash Attention DP4A shader for quantized KV cache (#20797) b8779 Ruben Ortlam 2026-04-13 14:21:31 +02:00
  • aa00911d12 common : add download cancellation and temp file cleanup (#21813) b8778 Adrien Gallouët 2026-04-13 11:18:23 +02:00
  • ce8fd4b1a6 server: Expose build_info in router mode (#21835) b8777 Gaspard Petit 2026-04-13 05:14:42 -04:00
  • 9f5e1edb10 CUDA: Limit DeviceSegmentedSort to immediate mode (#21718) b8776 Oliver Simons 2026-04-13 11:14:06 +02:00
  • 920b3e78cb mtmd: use causal attn for gemma 4 audio (#21824) b8775 Xuan-Son Nguyen 2026-04-13 09:47:55 +02:00
  • 974c8c94cc webui: add setting for first-line chat titles (#21797) Rohan Jain 2026-04-13 13:00:46 +05:30
  • 227ed28e12 webui: MCP Diagnostics improvements (#21803) Aleksander Grygier 2026-04-13 07:58:38 +02:00
  • bafae27654 Remove extra conditional check on debug mode. (#21798) b8772 Masashi Yoshimura 2026-04-13 12:13:04 +09:00
  • 873c825611 sycl: disable Q1_0 in backend and cleanup unused variables (#21807) b8771 Akarshan Biswas 2026-04-13 07:14:58 +05:30
  • 82764d8f40 mtmd: fix crash when sending image under 2x2 pixels (#21711) b8770 Sergiu 2026-04-13 00:59:21 +03:00
  • 21a4933042 mtmd: qwen3 audio support (qwen3-omni and qwen3-asr) (#19441) b8769 Xuan-Son Nguyen 2026-04-12 23:57:25 +02:00
  • 1e9d771e2c convert : force f16 or f32 on step3-vl conv weights (#21646) Sigbjørn Skjæret 2026-04-12 19:22:29 +02:00
  • aa4695c5e5 mtmd: add gemma 4 test (vision + audio) [no ci] (#21806) Xuan-Son Nguyen 2026-04-12 16:29:03 +02:00
  • 547765a93e mtmd: add Gemma 4 audio conformer encoder support (#21421) b8766 Stephen Cox 2026-04-13 00:15:26 +12:00
  • 9e209c5aee fix: Proper messages rendering for "Show raw output" (#21672) Aleksander Grygier 2026-04-12 13:08:11 +02:00
  • 6313acbef0 docs: add guide on how to add multimodal support (#21778) Xuan-Son Nguyen 2026-04-12 13:02:38 +02:00
  • ff5ef82786 CUDA: skip compilation of superfluous FA kernels (#21768) b8763 Johannes Gäßler 2026-04-11 18:52:11 +02:00
  • 073bb2c20b mtmd : add MERaLiON-2 multimodal audio support (#21756) b8762 Sirui He 2026-04-11 20:15:48 +08:00
  • af1127d3c4 opencl: add basic support for q5_k (#21593) b8761 shaofeiqi 2026-04-11 01:46:19 -07:00
  • 865ff06b2f TP: fix Qwen 3 Next data split (#21732) b8760 Johannes Gäßler 2026-04-11 09:23:42 +02:00
  • 2b2cd57de6 ggml : fix a few instances of missing GGML_TYPE_Q1_0 cases (#21716) b8759 Sigbjørn Skjæret 2026-04-11 08:45:00 +02:00
  • 660386f6f8 py : Bump typer to latest to fix huggingface_hub issue (#21701) Bartowski 2026-04-11 02:44:15 -04:00
  • a29e4c0b7b CUDA: also store node->src ne/nb for graph equality (#21736) b8757 Aman Gupta 2026-04-11 10:30:30 +08:00
  • b136b62cf9 fix: Fix broken structured output when using $refs in json_schema (#21699) b8756 Galunid 2026-04-11 01:26:36 +02:00
  • 81069a808a hexagon: add support for linux on snapdragon (#21707) b8755 Todor Boinovski 2026-04-10 15:57:23 -07:00
  • 9aa2807769 hexagon: improved Op queuing, buffer and cache management (#21705) b8754 Max Krasnyansky 2026-04-10 15:47:43 -07:00
  • 3fc65063d9 common : better align to the updated official gemma4 template (#21704) b8753 Aldehir Rojas 2026-04-10 16:12:53 -05:00
  • 05b3caaa48 common : add callback interface for download progress (#21735) b8752 Adrien Gallouët 2026-04-10 22:17:00 +02:00
  • e62fa13c24 model : make Gemma 4 shared-KV tail attn_k tensors optional on load (#21739) b8751 MoonRide303 2026-04-10 21:45:50 +02:00
  • bfd1f453cb ggml-webgpu: support non-square subgroup matrix configs for Intel GPUs (#21669) b8750 Rithik Sharma 2026-04-10 10:52:38 -07:00
  • e4fed9d08d ggml-webgpu: address quantization precision and backend lifecycle managment (#21521) b8749 Chen Yuan 2026-04-10 13:52:01 -04:00
  • 5dd102539b server : ignore --alias when using --models-preset (#21380) b8748 Adrien Gallouët 2026-04-10 17:42:56 +02:00
  • fb38d6f278 common : fix when loading a cached HF models with unavailable API (#21670) b8747 Adrien Gallouët 2026-04-10 16:37:46 +02:00
  • 0893f50f2d common: mark --split-mode tensor as experimental (#21684) b8746 Johannes Gäßler 2026-04-10 12:27:27 +02:00
  • f989a6e39e webui: Static build output improvements (#21667) Aleksander Grygier 2026-04-10 11:49:47 +02:00
  • d7ff074c87 common : enable reasoning budget sampler for gemma4 (#21697) b8744 Berk Idem 2026-04-10 05:49:14 -04:00
  • 3f8752b559 docs : fix broken link to ggml-openvino in OPENVINO.md (#21709) Belem Zhang 2026-04-10 15:50:08 +08:00
  • 7b69125331 vulkan: Support Q1_0 (#21539) b8742 Jeff Bolz 2026-04-10 01:35:27 -05:00
  • e095a482a0 common : add fluidity to the progress bar (#21671) b8741 Adrien Gallouët 2026-04-10 08:24:53 +02:00
  • e34f042154 CUDA: fuse muls (#21665) b8740 Aman Gupta 2026-04-10 10:24:09 +08:00
  • d132f22fc9 HIP: add CDNA4 (gfx950) architecture support for MI350X/MI355X (#21570) b8739 andyluo7 2026-04-09 22:13:32 +03:00
  • d6f3030047 ggml: backend-agnostic tensor parallelism (experimental) (#19378) b8738 Johannes Gäßler 2026-04-09 16:42:19 +02:00
  • 009a113326 ggml : check return value of CUB calls used in argsort and top-k (they all return cudaError_t) (#21676) b8737 fairydreaming 2026-04-09 15:17:11 +02:00
  • c8ac02fa1b requirements : update transformers to 5.5.1 (#21617) Daniel Bevenius 2026-04-09 12:36:29 +02:00
  • 4ef9301e4d webui: add "Send message on Enter" setting (#21577) JvM 2026-04-09 12:26:27 +02:00
  • ddf03c6d9a common : fix ambiguous grammar rule in gemma4 (#21661) b8734 Aldehir Rojas 2026-04-09 05:25:07 -05:00
  • 26229755c5 common : simplify autoparser tagged parser rules (#21216) b8733 Aldehir Rojas 2026-04-09 05:24:20 -05:00
  • 057dba336e model: fix multimodal padding token for gemma3n/gemma4 (#21625) b8732 Xuan-Son Nguyen 2026-04-09 12:18:23 +02:00
  • 501aeed18f mtmd: support dots.ocr (#17575) b8731 Xuan-Son Nguyen 2026-04-09 12:16:38 +02:00
  • 0ec191e1d7 vocab: add gemma4 tokenizer tests, fix edge case (#21534) b8730 Piotr Wilkin (ilintar) 2026-04-09 11:41:14 +02:00
  • 243532e556 jinja : support ensure_ascii=true, string repetition and int/float self-filtering (#21623) b8729 Kwa Jie Hao 2026-04-09 17:28:33 +08:00
  • 5e9c635463 metal : add missing mm-id specializations for q1_0 (#21662) b8728 Georgi Gerganov 2026-04-09 10:54:00 +03:00
  • 9949ad08f6 fix: Model Selector choice sync (#21628) Aleksander Grygier 2026-04-09 09:46:27 +02:00
  • 3ee9da0e4f server : fix grammar commandline args (#21543) b8726 AUTOMATIC1111 2026-04-09 10:16:54 +03:00
  • 75511a8d7e webui: Add option to pre-encode conversation for faster next turns (#21034) Aleksander Grygier 2026-04-09 09:10:18 +02:00
  • b54cb2e3d0 sycl : add flash-attn support for head size 512 (#21654) b8724 Akarshan Biswas 2026-04-09 12:06:48 +05:30
  • 8a65a7a8ee ci: drop v5 all: composition from labeler.yml (#21627) Marxist-Leninist 2026-04-09 07:20:19 +01:00
  • 8a132faaa0 vulkan: unify type macros to use Vx instead of _VECx (#21605) b8722 Ruben Ortlam 2026-04-09 07:31:51 +02:00
  • 4293919068 common : skip non-primary GGUF split files when selecting model (#21633) b8721 Adrien Gallouët 2026-04-09 07:28:06 +02:00
  • d12cc3d1ca CUDA: also store node->src->data ptrs for equality check (#21635) b8720 Aman Gupta 2026-04-09 01:01:56 +08:00
  • 2dcb7f74ed fix: free ctx_copy in ggml_opt_free to plug per-training-session leak (#21592) b8719 RealOrko 2026-04-08 16:40:15 +01:00
  • 660600081f server: respect the ignore eos flag (#21203) b8718 Yuri Khrustalev 2026-04-08 11:12:15 -04:00
  • d9a12c82f0 vocab : remove </s> eog token if gemma4 (#21492) b8717 Aldehir Rojas 2026-04-08 09:53:06 -05:00
  • 4a05e0c566 webui : send both backend_sampling == false/true (#18781) Georgi Gerganov 2026-04-08 17:35:52 +03:00
  • e9fd96283d Propose fix a couple of typos (#21581) b8715 John Eismeier 2026-04-08 10:29:03 -04:00
  • 3ba12fed0a kv-cache : extend cache quantization checks (#21586) b8714 Erik Scholz 2026-04-08 15:08:57 +02:00
  • 5473949070 webgpu : Query for adapter support when registering WebGPU backend (#21579) b8713 Reese Levine 2026-04-08 06:08:29 -07:00
  • dcdcbad42a metal: Q1_0 backend (#21528) b8712 Pasha Khosravi 2026-04-08 06:07:47 -07:00
  • 5764d7c6a6 gemma : perform per-layer projections in the first layer (#21612) b8711 Georgi Gerganov 2026-04-08 16:06:30 +03:00
  • 87f4744a80 examples : disable cb_eval callback for --save-logits (#21553) b8710 Daniel Bevenius 2026-04-08 14:10:33 +02:00
  • 85d482e6b6 parser: fix MiniMax handling (#21573) b8709 Piotr Wilkin (ilintar) 2026-04-08 12:47:25 +02:00
  • ae65fbdf33 tests : remove obsolete .mjs script (#21615) b8708 Georgi Gerganov 2026-04-08 13:20:46 +03:00
  • 3bd9aa1f92 chore: Update labeler to have separate labels for server/webui and server changes (#21567) Aleksander Grygier 2026-04-08 10:35:31 +02:00
  • ece522f98c chore: Remove legacy files (#21606) Aleksander Grygier 2026-04-08 09:55:08 +02:00
  • 09343c0198 model : support step3-vl-10b (#21287) b8705 forforever73 2026-04-08 15:51:31 +08:00
  • 97508acb17 webui: fix syntax highlighting lost after streaming for non-common languages (#21206) Hamish M. Blair 2026-04-07 23:58:08 -07:00
  • 5c4aae66e1 devops: kleidiai: provide KleidiAI-Enabled ARM Release Artifact (#21259) b8703 Martin Klacer 2026-04-08 06:06:12 +01:00
  • c5ce4bc227 CUDA: make cuda graphs props check faster (#21472) b8702 Aman Gupta 2026-04-08 09:05:51 +08:00
  • 66c4f9ded0 ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels (#21168) b8701 iacopPBK 2026-04-07 21:47:42 +02:00
  • 93bdc61563 gguf-py : fix missing comma after bad merge in tensor-mapping (#21558) Daniel Bevenius 2026-04-07 21:24:25 +02:00
  • 4eb19514dd kv-cache : support attention rotation for heterogeneous iSWA (#21513) b8699 Georgi Gerganov 2026-04-07 20:31:28 +03:00
  • 957d717ce5 ggml-webgpu: parameterize submission size and add iOS specific limits (#21533) b8698 Reese Levine 2026-04-07 10:30:01 -07:00
  • de1aa6fa73 CUDA: check for buffer overlap before fusing (#21566) b8697 Aman Gupta 2026-04-08 00:57:04 +08:00
  • 69c28f1547 llama-server: fix model params not propagated (#21509) b8696 Aaron Teo 2026-04-07 21:39:41 +08:00
  • 0d049d6a92 unicode : add custom Qwen2 regex handler to fix segfault on long input (#21257) Son H. Nguyen 2026-04-07 22:13:38 +09:00
  • a8ec0df461 llama: remove per-arch tensor name lists (#21531) b8694 Johannes Gäßler 2026-04-07 15:02:03 +02:00
  • e8f5082697 server : fix restore for checkpoints with pos_min == 0 (#21510) b8693 Georgi Gerganov 2026-04-07 15:29:17 +03:00
  • 22fc79134e ggml : deprecate GGML_OP_ADD1 (#21363) b8692 Georgi Gerganov 2026-04-07 15:28:27 +03:00
  • 2a619f6fbc ggml: Vulkan build, Linux -- output error string for errno on fork failure (#20868) (#20904) b8691 Tom Overlund 2026-04-07 07:54:55 -04:00
  • edd4d9bca5 vulkan: add FA dequant for q4_1, q5_0, q5_1, iq4_nl (#21029) b8690 mkoker 2026-04-07 07:41:29 -04:00
  • 482192f12d webui : store reasoning_content so it is sent back in subsequent requests (#21249) Aldehir Rojas 2026-04-07 06:32:44 -05:00
  • 71a81f6fcc ggml-cuda : fix CDNA2 compute capability constant for gfx90a (MI210) (#21519) b8688 Antoine Viallon 2026-04-07 12:18:55 +02:00
  • ecce0087da fix: Detect streaming state in reasoning content blocks (#21549) Aleksander Grygier 2026-04-07 12:04:41 +02:00
  • d1f82e382d Fix rtl text rendering (#21382) Kabir08 2026-04-07 15:07:20 +05:30
  • 0988accf82 [SYCL] Add Q8_0 reorder optimization (~3x tg speedup on Intel Arc) (#21527) b8685 PMZFX 2026-04-07 04:12:49 -04:00