Commit Graph

24 Commits

Author SHA1 Message Date
Ethan Turner fa0b8a70a8 cli: Remove redundant local sampling variables (#20429) (#22264)
This change implements the third requested change in issue 20429.
Because defaults.sampling contains the reasoning budget token count and
the reasoning budget message, it's not necessary to assign them to
struct variables.
2026-04-24 00:53:23 +02:00
Matthias Straka 0dd7f915fd cli : cleanup auto-completion code (#21745) 2026-04-23 15:03:28 +02:00
Piotr Wilkin (ilintar) 8bccdbbff9 chat: fix parallel_tool_calls default setting based on model capabilities, add tests for parallel tool calls and structured outputs (#22217)
* chat: fix parallel_tool_calls default setting based on model capabilities, add tests for parallel tool calls and structured outputs

* Fix ty errors.

* Fix flake8 err
2026-04-22 18:10:56 +02:00
Ethan Turner 750579ff14 common: Refactoring sampler parameters (#20429) (#22233)
This change refactors the reasoning_budget_message parameter from the
common params into the sampling parameters specifically. It also removes
the reasoning_budget common parameter and standardizes on the existing
reasoning_budget_tokens parameter in the sampling configuration.

Issue: https://github.com/ggml-org/llama.cpp/issues/20429
Original PR: https://github.com/ggml-org/llama.cpp/pull/20297
2026-04-22 10:40:19 +02:00
Georgi Gerganov cfe9838d26 fit-params : refactor + add option to output estimated memory per device (#22171)
* fit-params : add option to output estimated memory per device

* cont : minor

* cont : refactor

* cont : move fit params implementation to libcommon

* cont : header

* cont : headers

* cont : codeowners
2026-04-21 09:54:36 +03:00
Sigbjørn Skjæret 30dce2cf29 cli : use get_media_marker (#22017) 2026-04-17 00:12:31 +02:00
Adrien Gallouët 41361c8599 common : move up common_init() and fix Windows UTF-8 logs (#21176)
The build info is now only for debug, so we avoid the duplicate
with `--version`.

The UTF-8 setup at the beginning is needed to avoid logging
garbage on Windows.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-31 12:53:41 +02:00
Sigbjørn Skjæret c46758d28f cli : add /glob command (#21084)
* add /glob command

* output error when max files reached

* support globbing outside curdir
2026-03-28 02:33:04 +01:00
Aldehir Rojas 59d840209a common : inhibit lazy grammar sampler while reasoning is active (#20970)
* common : inhibit grammar while reasoning budget is active

* cont : update force_pos in accept

* cont : fix tests

* cont : tweak should apply logic

* cont : return early not using grammar sampler

* Add tests

* cont : prevent backend sampling when reasoning budget enabled

* cont : fix typo

---------

Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>
2026-03-27 18:30:40 +01:00
Piotr Wilkin (ilintar) 5e54d51b19 common/parser: add proper reasoning tag prefill reading (#20424)
* Implement proper prefill extraction

* Refactor cli parameters, update docs, move reasoning budget sampler part to common/reasoning-budget.cpp

* Update tools/server/server-task.cpp

* refactor: move grammars to variant, remove grammar_external, handle exception internally

* Make code less C++y

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-19 16:58:21 +01:00
Piotr Wilkin (ilintar) d2ecd2d1cf common/parser: add --skip-chat-parsing to force a pure content parser. (#20289)
* Add `--force-pure-content` to force a pure content parser.

* Update common/arg.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Change parameter name [no ci]

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-17 16:16:43 +01:00
Piotr Wilkin (ilintar) 9e2e2198b0 tools/cli: fix disable reasoning (#20606) 2026-03-15 22:40:53 +01:00
Piotr Wilkin (ilintar) acb7c79069 common/parser: handle reasoning budget (#20297)
* v1

* Finished!

* Handlie cli

* Reasoning sampler

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Less explosive terminology :)

* Add utf-8 case and tests

* common : migrate reasoning budget sampler to common

* cont : clean up

* cont : expose state and allow passing as initial state

* cont : remove unused imports

* cont : update state machine doc string

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Alde Rojas <hello@alde.dev>
2026-03-11 10:26:12 +01:00
Piotr Wilkin (ilintar) 566059a26b Autoparser - complete refactoring of parser architecture (#18675)
* Autoparser - full single commit squish

* Final pre-merge changes: minor fixes, Kimi 2.5 model parser
2026-03-06 21:01:00 +01:00
Roj234 f7db3f3789 cli : Don't clear system prompt when using '/clear' (#20067)
* Enhance /clear command to include system prompt

Add system prompt to messages when clearing chat history.

* Use lambda
2026-03-06 06:41:11 +01:00
Sigbjørn Skjæret b5ed0e058c cli : add command and file auto-completion (#19985) 2026-03-05 10:47:28 +01:00
Sigbjørn Skjæret e8e261699a cli : provide model with text filename (#19783) 2026-02-22 22:33:49 +01:00
Sigbjørn Skjæret b2ecc0cdb4 support --verbose-prompt (#19576) 2026-02-13 12:49:10 +01:00
Aldehir Rojas a3e812811d cli : load parser definition (#19031)
* cli : load parser definition

* cont : only unload if a parser is defined
2026-01-22 20:31:22 -06:00
Xuan-Son Nguyen 2c1f199653 cli : fix reasoning responses in CLI (#18961)
* cli : fix reasoning responses in CLI

* fix build

* fix build (2)
2026-01-20 18:23:25 +01:00
Xuan-Son Nguyen 6df686bee6 server : refactor oai_parser_opt, move it to server_chat_params (#18937)
* server_chat_params

* move chat format into CLI

* use meta whenever possible

* clean up, no more chatml fallback
2026-01-19 23:28:01 +01:00
Xuan-Son Nguyen 6ce863c803 server: prevent data race from HTTP threads (#18263)
* server: prevent data race from HTTP threads

* fix params

* fix default_generation_settings

* nits: make handle_completions_impl looks less strange

* stricter const

* fix GGML_ASSERT(idx < states.size())

* move index to be managed by server_response_reader

* http: make sure req & res lifecycle are tied together

* fix compile

* fix index handling buggy

* fix data race for lora endpoint

* nits: fix shadow variable

* nits: revert redundant changes

* nits: correct naming for json_webui_settings
2025-12-22 14:23:34 +01:00
Xuan-Son Nguyen ddcb75dd8a server: add auto-sleep after N seconds of idle (#18228)
* implement sleeping at queue level

* implement server-context suspend

* add test

* add docs

* optimization: add fast path

* make sure to free llama_init

* nits

* fix use-after-free

* allow /models to be accessed during sleeping, fix use-after-free

* don't allow accessing /models during sleep, it is not thread-safe

* fix data race on accessing props and model_meta

* small clean up

* trailing whitespace

* rm outdated comments
2025-12-21 02:24:42 +01:00
Xuan-Son Nguyen 6c2131773c cli: new CLI experience (#17824)
* wip

* wip

* fix logging, add display info

* handle commands

* add args

* wip

* move old cli to llama-completion

* rm deprecation notice

* move server to a shared library

* move ci to llama-completion

* add loading animation

* add --show-timings arg

* add /read command, improve LOG_ERR

* add args for speculative decoding, enable show timings by default

* add arg --image and --audio

* fix windows build

* support reasoning_content

* fix llama2c workflow

* color default is auto

* fix merge conflicts

* properly fix color problem

Co-authored-by: bandoti <bandoti@users.noreply.github.com>

* better loading spinner

* make sure to clean color on force-exit

* also clear input files on "/clear"

* simplify common_log_flush

* add warning in mtmd-cli

* implement console writter

* fix data race

* add attribute

* fix llama-completion and mtmd-cli

* add some notes about console::log

* fix compilation

---------

Co-authored-by: bandoti <bandoti@users.noreply.github.com>
2025-12-10 15:28:59 +01:00