server: support multiple generations from one prompt (OAI "n" option) (#17775)

* backend support

* server: support multiple generations from one prompt (OAI "n" option)

* fix invalid batch

* format oai

* clean up

* disable ctx shift

* add test

* update comments

* fix style

* add n_cmpl to docs [no ci]

* allowing using both n_cmpl and n
This commit is contained in:
Xuan-Son Nguyen
2025-12-06 15:54:38 +01:00
committed by GitHub
parent 09c7c50e64
commit c42712b056
7 changed files with 146 additions and 19 deletions
+2
View File
@@ -215,6 +215,8 @@ public:
llama_pos pos,
int32_t seq_id,
size_t & n_tokens_out) const;
server_tokens clone() const;
};