Files
llama.cpp/tools
Xuan-Son Nguyen f896d2c34f server: improve speed of speculative decoding (#17808)
* server: improve speed of speculative decoding

* fix small draft case

* add link to the PR

* server : fix generation time measurement

* server : fix draft acceptance logs (add SRV_CNT, SLT_CNT macros)

* server : add comment

* add PR to docs

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-08 14:35:28 +01:00
..
2025-11-30 15:32:43 +01:00
2025-12-02 22:48:08 +01:00
2025-09-22 09:11:39 +03:00