fix(cookbook): avoid pip --user in venv, surface install errors in download wrapper #363
Closed
ErnestHysa wants to merge 4 commits from
ErnestHysa/main into main
pull from: ErnestHysa/main
merge into: sleepy:main
sleepy:main
sleepy:dev
sleepy:fix/effective-mode-undefined
sleepy:fix/async-generator-500
sleepy:refactor/split-small-routes-779
sleepy:refactor/split-session-routes-778
sleepy:refactor/split-llm-core-700
sleepy:refactor/split-chat-routes-724
sleepy:refactor/split-model-routes-701
sleepy:refactor/split-skills-routes-777
sleepy:refactor/split-document-routes-769
sleepy:refactor/split-ai-interaction-722
sleepy:refactor/split-cookbook-routes-775
sleepy:refactor/split-builtin-actions-723
sleepy:refactor/split-task-scheduler-774
sleepy:refactor/tool-execution-split-667
sleepy:fix/rag-health-coordinator-756
sleepy:fix/agent-core-missing-round-loop
sleepy:fix/rag-health-check-756
sleepy:fix/missing-readmes-782
sleepy:fix/search-files-split-772
sleepy:fix/dedup-search-impl-771
sleepy:fix/mcp-integrations-split-781
sleepy:fix/cookbook-helpers-split-785
sleepy:fix/search-cross-feature-776
sleepy:fix/graceful-llm-degradation-770
sleepy:fix/dedup-research-handler-773
sleepy:fix/cross-route-coupling-780
sleepy:fix/skills-typed-request-762
sleepy:fix/chat-stream-pydantic-731
sleepy:fix/chat-handler-imports-737
sleepy:fix/public-llm-exports-730
sleepy:fix/dedup-search-content-784
sleepy:fix/dedup-stream-parse-735
sleepy:fix/stream-error-boundary-734
sleepy:fix/typed-event-names-786
sleepy:fix/dedup-js-utilities-792
sleepy:fix/skills-index-lookup-761
sleepy:fix/xss-charname-innerhtml-788
sleepy:fix/coderunner-document-write-789
sleepy:fix/ai-interaction-thread-safety-729
sleepy:fix/client-side-api-key-regex-793
sleepy:fix/dead-loadingtext-796
sleepy:fix/active-streams-eviction-736
sleepy:fix/builtin-actions-sqlite-orm-732
sleepy:fix/llm-core-thread-safety-709
sleepy:fix/singleton-thread-safety-768
sleepy:fix/video-upload-multimodal-826
sleepy:fix/mic-button-audio-825
sleepy:fix/multimodal-capabilities-827
sleepy:fix/dedup-system-msg-717
sleepy:fix/service-dead-calls-750
sleepy:fix/db-session-context-manager-757
sleepy:fix/dedup-llmconfig-710
sleepy:fix/keyword-lists-764-v2
sleepy:fix/keyword-lists-764
sleepy:fix/llm-cache-ttl-708
sleepy:fix/validate-config-711
sleepy:fix/totp-constant-time-689
sleepy:fix/admin-tools-constant-681
sleepy:fix/dedup-truncate-670
sleepy:fix/threadsafe-extractions-audit-754
sleepy:fix/dedup-json-io-767
sleepy:fix/695-deduplicate-admin-check
sleepy:fix/upload-auth-dedup-765
sleepy:fix/705-deduplicate-network-constants
sleepy:fix/deduplicate-chunking-749
sleepy:fix/760-deduplicate-embed
sleepy:fix/dedup-validation-759
sleepy:fix/748-deduplicate-tokenization
sleepy:fix/add-readmes-728-742-782
sleepy:fix/key-file-permissions-706
sleepy:fix/embedding-retry-766
sleepy:fix/deterministic-doc-ids-753
sleepy:fix/search-facade-783
sleepy:fix/remove-dead-memory-746
sleepy:fix/remove-rag-manager-747
sleepy:fix/dead-docstring-763
sleepy:refactor/split-tool-schemas-666
sleepy:refactor/split-agent-loop-664
sleepy:refactor/split-tool-implementations-665
sleepy:fix/mark-stopped-500-663
sleepy:fix/streamingtts-scope-662
sleepy:fix/code-block-tool-parsing-661
No reviewers
Labels
Clear labels
area:chat
area:core
area:llm
area:routes
area:tools
bug
Something isn't working
documentation
Improvements or additions to documentation
duplicate
This issue or pull request already exists
enhancement
New feature or request
good first issue
Good for newcomers
help wanted
Extra attention is needed
invalid
This doesn't seem right
question
Further information is requested
refactor
wontfix
This will not be worked on
No labels
area:chat
area:core
area:llm
area:routes
area:tools
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
refactor
wontfix
Milestone
Clear milestone
No items
No milestone
Projects
Clear projects
No items
No project
Assignees
Clear assignees
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".
No due date set.
Dependencies
No dependencies set.
Reference
sleepy/odysseus!363
Loading…
Reference in a new issue
No description provided.
Delete branch "ErnestHysa/main"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
POST /api/model/download(Cookbook → Model Download UI) was failing silently inside an active venv. The auto-install step in the generated shell wrapper unconditionally triedpip install --user --break-system-packages, whichpiprejects in a virtualenv withERROR: Can not perform a '--user' install. User site-packages are not visible in this virtualenv.. The error was swallowed by a blanket2>/dev/null, so the wrapper exited0withouthfever being installed, the actualhf downloadcall failed downstream, and the route still returned{"ok": true, "session_id": "..."}— the UI marked the task as running and the user saw a successful start.This affects the recommended install path on macOS (the project is run from a venv in the v1.0 README), and similarly affects any Conda environment.
Closes #354.
Why it matters
The venv block on
--useris well-known (USER_BASEandUSER_SITEare forced toNonewhensys.real_prefixis set, per PEP 405). The fix is trivial if you know about it — and that's the catch: this codebase was trying the wrong install first. The fallback chain in the wrapper was:…and the failure of the first attempt was hidden. The "claims finished successfully" symptom in #354 is the worst kind of failure mode: no exception, no error toast, no log line that points to the cause — just a tmux session that exits 0 and a task that never makes progress.
The same code path is used in the remote SSH runner (when a user downloads a model onto a remote host through the Cookbook), so the fix applies to both.
What I changed
routes/cookbook_routes.py— two install sites (the local tmux wrapper and the remote SSH runner), each with the same two install lines (huggingface_hubandhf_transfer).Reordered the install attempts so they respect the runtime environment:
pip installfirst (it writes into the venv, which is what the user wants). If that fails because the venv is externally managed (PEP 668), fall through to--break-system-packagesso the install actually happens rather than leavinghfmissing.--user --break-system-packagesfirst, then a plain install as fallback.pipstderr are now piped into the tmux log (2>&1 | tail -5) instead of being discarded. If an install still fails for some other reason, the user sees the cause in the Cookbook log panel rather than guessing.I also unified the third fallback (
--break-system-packageswithout--user) so PEP 668 systems that block even--user --break-system-packagesstill get a working install on the third try.Verification
python -c "import ast; ast.parse(open('routes/cookbook_routes.py').read())"→syntax: OK.pip install --user ...against a fresh venv reproduces the exact error from #354, confirming the original failure mode.VIRTUAL_ENV=<path>the new branch resolves to the plainpip installfirst; withVIRTUAL_ENVunset it resolves to the original--user --break-system-packagesfirst. Verified by hand-running the generated wrapper snippet in both states.venv/already hashuggingface_hubinstalled, socommand -v hfshort-circuits the install branch and the wrapper still produces the same wrapper script as before for happy-path users. (Diff is local to the install fallback; thehfcommand line and the rest of the wrapper are untouched.)pipinvocations there usepip(notpython3 -m pip) — that's pre-existing and matches the previous remote-runner behavior; I did not change it.Out of scope / things I deliberately did not change
sys.prefix != sys.base_prefix). The shell-side${VIRTUAL_ENV:-}check is enough for the wrapper, and it covers Conda (CONDA_PREFIX) too. A server-side check would be redundant.tests/test_shell_routes.py,tests/test_*.py) covers routes that return from the server, not the contents of generated shell wrappers. Adding a meaningful test would mean snapshotting the wrapper output againstVIRTUAL_ENVset/unset, which is a different kind of test than the rest of the suite and probably belongs in its own PR. Happy to follow up if the maintainer wants it.pipvspython3 -m pipinconsistency between the local wrapper and the remote runner — that's a pre-existing divergence and orthogonal to this bug.abelgarzaon #354 suggesting Docker is a fine workaround, but the manual-install path in the README explicitly supports a venv, and the venv path is broken today. The fix is the right one.Notes for the reviewer
routes/cookbook_routes.pyonly.2>&1 | tail -5deliberately keeps pip stderr visible in the log — if you'd prefer the old "silent on success, visible on real failure" UX, swap to[[ $? -eq 0 ]] || 2>&1 | tail -5, but I think always-visible-on-failure is what users expect.Thanks, this targets the right pain point, but I would not merge it as-is because the new shell chains can mask pip failures.
Checks I ran:
Blocker: commands like this use a pipeline:
python3 -m pip install -q -U huggingface_hub 2>&1 | tail -5Without
set -o pipefailor an explicit status check, the pipeline status is usuallytail's status, not pip's. So if pip fails,tailcan still return 0, the||fallback chain stops, and the script proceeds as if install worked. That means the PR may surface the last lines but still skip the intended fallback install paths.Can you preserve pip's exit status while still showing useful output? For example, capture output to a temp file, print
tail -5on failure, thenreturn/exitwith pip's real status; or run under bash withset -o pipefailif these generated scripts are guaranteed to be bash. After that this should be a good fix for #354/#388.Good catch, thank you for the thorough review. The
| tail -5masking pip's exit code is a real bug — I should have caught that.Pushed a fix in the second commit (9ed6e46). Each pip install attempt now runs in its own
bash -csubshell that:$(mktemp))_pip_ec=$?)tail -5of the captured output (so the tmux log still shows useful diagnostics)This way the
||fallback chain sees the real failure and correctly tries the next install path. Applies to both thehuggingface_hubandhf_transferinstall commands.Example of what a single attempt now generates:
If pip returns 1, the subshell exits 1, and the
||chain advances to the next fallback. No more silent masking.Let me know if you'd prefer
set -o pipefailat the script level instead (the shebang is already#!/bin/bashso it's guaranteed available) — happy to switch if that's cleaner for the codebase.Good diagnosis, but I don't want to merge this exact patch yet. The local wrapper side preserves pip's real exit code, but the remote SSH runner still uses ERROR: unknown command "...", which can mask a failed pip install as success. This needs a small revision so both local and remote install paths preserve failure status while still surfacing the useful log tail.
I think the remaining blocker is that the Linux remote runner still uses
pip ... 2>&1 | tail -5for the remotehuggingface_hub/hf_transferinstall attempts.The local wrapper now captures pip output to a temp file and exits with the original pip status, so the fallback chain sees real failures. The remote runner does not appear to use that same status-preserving pattern yet, so a failed pip install can still be treated as success because the pipeline returns the
tailstatus.A small helper for rendering a status-preserving pip install attempt could make this less error-prone and reusable for both local and remote install lines. A focused helper test would also avoid importing the full route/database stack.
You're right — I only applied the status-preserving pattern to the local wrapper and left the remote runner with the old
| tail -5pipeline. That was a miss.Pushed a new commit that addresses both points:
1. Remote runner now uses the same temp-file pattern. The
pip ... 2>&1 | tail -5lines in the Linux/Termux remote runner are gone. Both thehuggingface_hubandhf_transferinstall attempts now go throughbash -c '<temp-file snippet>', which exits with pip's real status.2. Extracted
_pip_install_snippet(pip_args)helper. The temp-file + exit-code pattern was duplicated across 6 inline string literals. It's now a single function at module level that both the local wrapper and remote runner call. Less error-prone, and the||fallback chains on both sides are generated from the same code path.3. Focused test.
tests/test_pip_install_snippet.pyexercises the helper in isolation (no FastAPI/DB imports). Nine tests cover: temp file capture, exit code save, tail output, cleanup, pip-args injection, failure propagation (actualbash -crun against a non-existent package), and a negative check that no bare| tailpipeline appears in the output.Net diff is -51 lines — the helper consolidation actually removes more than it adds.
Spot-on diagnosis. The remote runner was still on the
| tail -5pipeline — exactly the same masking bug I fixed on the local side, just missed it on the other code path.Done in the latest push. Both paths now go through a shared
_pip_install_snippet(pip_args)helper that generates the temp-file + exit-code snippet. The helper has its own test file (tests/test_pip_install_snippet.py) that runs the generated shell against a deliberately broken pip install to confirm the exit code actually propagates. No route/DB imports needed.Pull request closed