[structural] Module-level mutable globals in llm_core.py create thread-safety issues #709

Closed
opened 2026-06-02 23:57:35 +02:00 by sleepy · 0 comments
Owner

File: src/llm_core.py

The following module-level mutable dicts have no thread-safety protection:

  • _response_cache (line 42) — read/written by sync and async calls
  • _dead_hosts (line 57) — read/written by concurrent stream handlers
  • _host_fails (line 58) — read/written by concurrent stream handlers
  • _model_activity (line 59) — read/written by concurrent requests

In production with asyncio + ThreadPoolExecutor (used in model_routes._refresh_caches_bg), these dicts can be mutated concurrently. CPython's GIL protects against corruption but not against race conditions in multi-step operations like the cache eviction (lines 130–134: check length → collect keys → delete).

Fix: Use threading.Lock for the mutable dicts, or replace with thread-safe alternatives:

  • _response_cachecachetools.TTLCache (thread-safe)
  • _dead_hosts/_host_fails → protect with a lock
  • _model_activity → protect with a lock

This is related to issue #671 (module-level global mutable state) but specific to the LLM core layer.

**File:** `src/llm_core.py` The following module-level mutable dicts have no thread-safety protection: - `_response_cache` (line 42) — read/written by sync and async calls - `_dead_hosts` (line 57) — read/written by concurrent stream handlers - `_host_fails` (line 58) — read/written by concurrent stream handlers - `_model_activity` (line 59) — read/written by concurrent requests In production with `asyncio` + `ThreadPoolExecutor` (used in `model_routes._refresh_caches_bg`), these dicts can be mutated concurrently. CPython's GIL protects against corruption but not against race conditions in multi-step operations like the cache eviction (lines 130–134: check length → collect keys → delete). **Fix:** Use `threading.Lock` for the mutable dicts, or replace with thread-safe alternatives: - `_response_cache` → `cachetools.TTLCache` (thread-safe) - `_dead_hosts`/`_host_fails` → protect with a lock - `_model_activity` → protect with a lock This is related to issue #671 (module-level global mutable state) but specific to the LLM core layer.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/odysseus#709
No description provided.