[structural] Module-level mutable globals in llm_core.py create thread-safety issues #709

New issue

Closed

opened 2026-06-02 23:57:35 +02:00 by sleepy · 0 comments

sleepy commented

2026-06-02 23:57:35 +02:00

Owner

File: src/llm_core.py

The following module-level mutable dicts have no thread-safety protection:

_response_cache (line 42) — read/written by sync and async calls
_dead_hosts (line 57) — read/written by concurrent stream handlers
_host_fails (line 58) — read/written by concurrent stream handlers
_model_activity (line 59) — read/written by concurrent requests

In production with asyncio + ThreadPoolExecutor (used in model_routes._refresh_caches_bg), these dicts can be mutated concurrently. CPython's GIL protects against corruption but not against race conditions in multi-step operations like the cache eviction (lines 130–134: check length → collect keys → delete).

Fix: Use threading.Lock for the mutable dicts, or replace with thread-safe alternatives:

_response_cache → cachetools.TTLCache (thread-safe)
_dead_hosts/_host_fails → protect with a lock
_model_activity → protect with a lock

This is related to issue #671 (module-level global mutable state) but specific to the LLM core layer.

**File:** `src/llm_core.py` The following module-level mutable dicts have no thread-safety protection: - `_response_cache` (line 42) — read/written by sync and async calls - `_dead_hosts` (line 57) — read/written by concurrent stream handlers - `_host_fails` (line 58) — read/written by concurrent stream handlers - `_model_activity` (line 59) — read/written by concurrent requests In production with `asyncio` + `ThreadPoolExecutor` (used in `model_routes._refresh_caches_bg`), these dicts can be mutated concurrently. CPython's GIL protects against corruption but not against race conditions in multi-step operations like the cache eviction (lines 130–134: check length → collect keys → delete). **Fix:** Use `threading.Lock` for the mutable dicts, or replace with thread-safe alternatives: - `_response_cache` → `cachetools.TTLCache` (thread-safe) - `_dead_hosts`/`_host_fails` → protect with a lock - `_model_activity` → protect with a lock This is related to issue #671 (module-level global mutable state) but specific to the LLM core layer.

sleepy referenced this issue from a commit

2026-06-03 19:57:37 +02:00

[llm] Thread-safe mutable globals in llm_core (#709)

sleepy referenced this issue from a pull request that will close it,

2026-06-03 19:58:12 +02:00

[llm] Thread-safe mutable globals in llm_core (#709) #832

sleepy closed this issue

2026-06-03 19:58:23 +02:00