llama.cpp

Files

T

Ben Racicot c1b911654a server: fix router mode deadlock on child crash and TOCTOU race in models_max (#20763 )

Two bugs in `server_models::load()` that affect router mode reliability:

**Bug 1: Deadlock when child process crashes**

When a child process is killed (e.g., SIGKILL from OS code signature
validation), the monitoring thread deadlocks on `stopping_thread.join()`
because the stopping_thread's wait predicate (`is_stopping`) is never
satisfied — the model name was never inserted into `stopping_models`.
`update_status()` is never reached and the model stays stuck in LOADING
state permanently.

Fix: extend the stopping_thread's wait predicate to also wake when the
child process is no longer alive (`!subprocess_alive()`). When woken by
a dead child, the thread skips the shutdown sequence and returns
immediately. The original `stopping_models.erase()` logic is preserved
for normal unloads.

**Bug 2: TOCTOU race bypasses `--models-max` (ref #20137)**

`unload_lru()` is called outside the mutex, then `load()` acquires the
lock afterward. Under concurrent requests, multiple threads observe
capacity and all proceed to load, exceeding the limit.

Fix: re-check capacity under the lock after `unload_lru()` returns.
If another thread filled the slot in the window between `unload_lru()`
and the lock acquisition, reject with an error instead of silently
exceeding the limit.

2026-03-19 22:16:05 +01:00

batched-bench

Fix locale-dependent float printing in GGUF metadata (#17331 )

2026-03-04 09:30:40 +01:00

cli

common/parser: add proper reasoning tag prefill reading (#20424 )

2026-03-19 16:58:21 +01:00

completion

common/parser: add --skip-chat-parsing to force a pure content parser. (#20289 )

2026-03-17 16:16:43 +01:00

cvector-generator

chore : correct typos [no ci] (#20041 )