common/parser: add proper reasoning tag prefill reading (#20424)

* Implement proper prefill extraction

* Refactor cli parameters, update docs, move reasoning budget sampler part to common/reasoning-budget.cpp

* Update tools/server/server-task.cpp

* refactor: move grammars to variant, remove grammar_external, handle exception internally

* Make code less C++y

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit is contained in:
Piotr Wilkin (ilintar)
2026-03-19 16:58:21 +01:00
committed by GitHub
parent c1258830b2
commit 5e54d51b19
33 changed files with 651 additions and 454 deletions
+3 -3
View File
@@ -907,7 +907,7 @@ If query param `?fail_on_no_slot=1` is set, this endpoint will respond with stat
"chat_format": "GPT-OSS",
"reasoning_format": "none",
"reasoning_in_content": false,
"thinking_forced_open": false,
"generation_prompt": "",
"samplers": [
"penalties",
"dry",
@@ -972,7 +972,7 @@ If query param `?fail_on_no_slot=1` is set, this endpoint will respond with stat
"chat_format": "GPT-OSS",
"reasoning_format": "none",
"reasoning_in_content": false,
"thinking_forced_open": false,
"generation_prompt": "",
"samplers": [
"penalties",
"dry",
@@ -1193,7 +1193,7 @@ The `response_format` parameter supports both plain JSON output (e.g. `{"type":
`reasoning_format`: The reasoning format to be parsed. If set to `none`, it will output the raw generated text.
`thinking_forced_open`: Force a reasoning model to always output the reasoning. Only works on certain models.
`generation_prompt`: The generation prompt that was prefilled in by the template. Prepended to model output before parsing.
`parse_tool_calls`: Whether to parse the generated tool call.