llama.cpp

Files

T

Ryan Goulden 26c9ce1288 server: Add cached_tokens info to oaicompat responses (#19361 )

* tests : fix fetch_server_test_models.py

* server: to_json_oaicompat cached_tokens

Adds OpenAI and Anthropic compatible information about the
number of cached prompt tokens used in a response.

2026-03-19 19:09:33 +01:00

test_basic.py

server : support multiple model aliases via comma-separated --alias (#19926 )

2026-02-27 07:05:23 +01:00

test_chat_completion.py

server: Add cached_tokens info to oaicompat responses (#19361 )

2026-03-19 19:09:33 +01:00

test_compat_anthropic.py

server: Add cached_tokens info to oaicompat responses (#19361 )

2026-03-19 19:09:33 +01:00

test_compat_oai_responses.py

server: /v1/responses (partial) (#18486 )

2026-01-21 17:47:23 +01:00

test_completion.py

server : fix wait in test_cancel_requests() test (#20601 )

2026-03-15 20:54:37 +02:00

test_ctx_shift.py

memory : remove KV cache size padding (#16812 )

2025-10-28 20:19:44 +02:00

test_embedding.py

llama : fix pooling assertion crash in chunked GDN detection path (#20468 )

2026-03-13 20:53:42 +02:00

test_infill.py

server : support unified cache across slots (#16736 )

2025-11-02 18:14:04 +02:00

test_lora.py

server : disable context shift by default (#15416 )

2025-08-19 16:46:37 +03:00

test_proxy.py

server: Parse port numbers from MCP server URLs in CORS proxy (#20208 )

2026-03-09 17:47:54 +01:00

test_rerank.py

server / ranking : add sorting and management of top_n (#16403 )

2025-10-11 16:39:04 +03:00

test_router.py

server: add router multi-model tests (#17704 ) (#17722 )

2025-12-03 15:10:37 +01:00

test_security.py

server: add --media-path for local media files (#17697 )

2025-12-02 22:49:20 +01:00

test_sleep.py

server: add auto-sleep after N seconds of idle (#18228 )

2025-12-21 02:24:42 +01:00

test_slot_save.py

server : disable context shift by default (#15416 )

2025-08-19 16:46:37 +03:00

test_speculative.py

server : adjust spec tests to generate up to 16 tokens (#19093 )

2026-01-28 09:11:40 +02:00

test_template.py

tests : use reasoning instead of reasoning_budget in server tests (#20432 )

2026-03-12 13:41:01 +01:00

test_tokenize.py

server : disable context shift by default (#15416 )

2025-08-19 16:46:37 +03:00

test_tool_call.py

Autoparser - complete refactoring of parser architecture (#18675 )

2026-03-06 21:01:00 +01:00

test_vision_api.py

server : speed up tests (#15836 )

2025-09-06 14:45:24 +02:00