[embeddings] Failed HTTP endpoint latched for entire process lifetime with no auto-retry #766
Labels
No labels
area:chat
area:core
area:llm
area:routes
area:tools
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
refactor
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
sleepy/odysseus#766
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
"File:
src/embeddings.pyline 203python _http_embed_down = False # process-level latchOnce the HTTP embedding endpoint fails,_http_embed_down = Trueis set for the entire process lifetime. The only way to reset it is callingreset_http_embed_state()— which is only triggered by manual admin panel saves. This means: 1. If the embedding endpoint is briefly down during startup, the process runs on FastEmbed forever 2. If the endpoint recovers, no automatic retry occurs 3. In a long-running server, this can cause degraded quality (FastEmbed may use a different/smaller model than the configured endpoint) Therag_singleton.pyhas a better pattern — it retries every 30 seconds.embeddings.pyshould adopt a similar approach. Action: Replace the boolean latch with a time-based retry (e.g., re-probe every N seconds after failure), similar torag_singleton.py's_RETRY_INTERVAL."Fixed in PR #802 — replaced boolean latch with time-based retry (30s re-probe interval).