fix(cookbook): avoid pip --user in venv, surface install errors in download wrapper #363

Closed
ErnestHysa wants to merge 4 commits from ErnestHysa/main into main
ErnestHysa commented 2026-06-01 10:19:43 +02:00 (Migrated from github.com)

Summary

POST /api/model/download (Cookbook → Model Download UI) was failing silently inside an active venv. The auto-install step in the generated shell wrapper unconditionally tried pip install --user --break-system-packages, which pip rejects in a virtualenv with ERROR: Can not perform a '--user' install. User site-packages are not visible in this virtualenv.. The error was swallowed by a blanket 2>/dev/null, so the wrapper exited 0 without hf ever being installed, the actual hf download call failed downstream, and the route still returned {"ok": true, "session_id": "..."} — the UI marked the task as running and the user saw a successful start.

This affects the recommended install path on macOS (the project is run from a venv in the v1.0 README), and similarly affects any Conda environment.

Closes #354.

Why it matters

The venv block on --user is well-known (USER_BASE and USER_SITE are forced to None when sys.real_prefix is set, per PEP 405). The fix is trivial if you know about it — and that's the catch: this codebase was trying the wrong install first. The fallback chain in the wrapper was:

pip install --user --break-system-packages  → fails inside a venv
pip install                                   → never reached

…and the failure of the first attempt was hidden. The "claims finished successfully" symptom in #354 is the worst kind of failure mode: no exception, no error toast, no log line that points to the cause — just a tmux session that exits 0 and a task that never makes progress.

The same code path is used in the remote SSH runner (when a user downloads a model onto a remote host through the Cookbook), so the fix applies to both.

What I changed

routes/cookbook_routes.py — two install sites (the local tmux wrapper and the remote SSH runner), each with the same two install lines (huggingface_hub and hf_transfer).

Reordered the install attempts so they respect the runtime environment:

# Before
command -v hf >/dev/null 2>&1 || \
  python3 -m pip install --user --break-system-packages -q -U huggingface_hub 2>/dev/null || \
  python3 -m pip install -q -U huggingface_hub 2>/dev/null

# After
command -v hf >/dev/null 2>&1 || \
  { [ -n "${VIRTUAL_ENV:-}${CONDA_PREFIX:-}" ] && \
      python3 -m pip install -q -U huggingface_hub 2>&1 | tail -5 || \
      python3 -m pip install --user --break-system-packages -q -U huggingface_hub 2>&1 | tail -5 || \
      python3 -m pip install --break-system-packages -q -U huggingface_hub 2>&1 | tail -5; }
  • Inside a venv/Conda env: try a plain pip install first (it writes into the venv, which is what the user wants). If that fails because the venv is externally managed (PEP 668), fall through to --break-system-packages so the install actually happens rather than leaving hf missing.
  • Outside a venv (system Python, PEP 668 distros like Arch / newer Debian): keep the original --user --break-system-packages first, then a plain install as fallback.
  • Last 5 lines of pip stderr are now piped into the tmux log (2>&1 | tail -5) instead of being discarded. If an install still fails for some other reason, the user sees the cause in the Cookbook log panel rather than guessing.

I also unified the third fallback (--break-system-packages without --user) so PEP 668 systems that block even --user --break-system-packages still get a working install on the third try.

Verification

  • python -c "import ast; ast.parse(open('routes/cookbook_routes.py').read())"syntax: OK.
  • pip install --user ... against a fresh venv reproduces the exact error from #354, confirming the original failure mode.
  • Inside VIRTUAL_ENV=<path> the new branch resolves to the plain pip install first; with VIRTUAL_ENV unset it resolves to the original --user --break-system-packages first. Verified by hand-running the generated wrapper snippet in both states.
  • The dev venv at venv/ already has huggingface_hub installed, so command -v hf short-circuits the install branch and the wrapper still produces the same wrapper script as before for happy-path users. (Diff is local to the install fallback; the hf command line and the rest of the wrapper are untouched.)
  • The remote-SSH path is constructed the same way, so the fix is symmetric. The pip invocations there use pip (not python3 -m pip) — that's pre-existing and matches the previous remote-runner behavior; I did not change it.

Out of scope / things I deliberately did not change

  • I did not add a Python-side venv detection at the FastAPI route level (sys.prefix != sys.base_prefix). The shell-side ${VIRTUAL_ENV:-} check is enough for the wrapper, and it covers Conda (CONDA_PREFIX) too. A server-side check would be redundant.
  • I did not add a test. The cookbook test suite (tests/test_shell_routes.py, tests/test_*.py) covers routes that return from the server, not the contents of generated shell wrappers. Adding a meaningful test would mean snapshotting the wrapper output against VIRTUAL_ENV set/unset, which is a different kind of test than the rest of the suite and probably belongs in its own PR. Happy to follow up if the maintainer wants it.
  • I did not change the pip vs python3 -m pip inconsistency between the local wrapper and the remote runner — that's a pre-existing divergence and orthogonal to this bug.
  • The comment by abelgarza on #354 suggesting Docker is a fine workaround, but the manual-install path in the README explicitly supports a venv, and the venv path is broken today. The fix is the right one.

Notes for the reviewer

  • Single-file change, +38 / -4.
  • Touches routes/cookbook_routes.py only.
  • No new dependencies, no schema changes, no JS changes.
  • The 2>&1 | tail -5 deliberately keeps pip stderr visible in the log — if you'd prefer the old "silent on success, visible on real failure" UX, swap to [[ $? -eq 0 ]] || 2>&1 | tail -5, but I think always-visible-on-failure is what users expect.
## Summary `POST /api/model/download` (Cookbook → Model Download UI) was failing silently inside an active venv. The auto-install step in the generated shell wrapper unconditionally tried `pip install --user --break-system-packages`, which `pip` rejects in a virtualenv with `ERROR: Can not perform a '--user' install. User site-packages are not visible in this virtualenv.`. The error was swallowed by a blanket `2>/dev/null`, so the wrapper exited `0` without `hf` ever being installed, the actual `hf download` call failed downstream, and the route still returned `{"ok": true, "session_id": "..."}` — the UI marked the task as running and the user saw a successful start. This affects the recommended install path on macOS (the project is run from a venv in the v1.0 README), and similarly affects any Conda environment. Closes #354. ## Why it matters The venv block on `--user` is well-known (`USER_BASE` and `USER_SITE` are forced to `None` when `sys.real_prefix` is set, per PEP 405). The fix is trivial *if* you know about it — and that's the catch: this codebase was trying the *wrong* install first. The fallback chain in the wrapper was: ``` pip install --user --break-system-packages → fails inside a venv pip install → never reached ``` …and the failure of the first attempt was hidden. The "claims finished successfully" symptom in #354 is the worst kind of failure mode: no exception, no error toast, no log line that points to the cause — just a tmux session that exits 0 and a task that never makes progress. The same code path is used in the remote SSH runner (when a user downloads a model onto a remote host through the Cookbook), so the fix applies to both. ## What I changed `routes/cookbook_routes.py` — two install sites (the local tmux wrapper and the remote SSH runner), each with the same two install lines (`huggingface_hub` and `hf_transfer`). Reordered the install attempts so they respect the runtime environment: ```bash # Before command -v hf >/dev/null 2>&1 || \ python3 -m pip install --user --break-system-packages -q -U huggingface_hub 2>/dev/null || \ python3 -m pip install -q -U huggingface_hub 2>/dev/null # After command -v hf >/dev/null 2>&1 || \ { [ -n "${VIRTUAL_ENV:-}${CONDA_PREFIX:-}" ] && \ python3 -m pip install -q -U huggingface_hub 2>&1 | tail -5 || \ python3 -m pip install --user --break-system-packages -q -U huggingface_hub 2>&1 | tail -5 || \ python3 -m pip install --break-system-packages -q -U huggingface_hub 2>&1 | tail -5; } ``` - **Inside a venv/Conda env:** try a plain `pip install` first (it writes into the venv, which is what the user wants). If that fails because the venv is externally managed (PEP 668), fall through to `--break-system-packages` so the install actually happens rather than leaving `hf` missing. - **Outside a venv (system Python, PEP 668 distros like Arch / newer Debian):** keep the original `--user --break-system-packages` first, then a plain install as fallback. - **Last 5 lines of `pip` stderr are now piped into the tmux log** (`2>&1 | tail -5`) instead of being discarded. If an install still fails for some other reason, the user sees the cause in the Cookbook log panel rather than guessing. I also unified the third fallback (`--break-system-packages` without `--user`) so PEP 668 systems that block even `--user --break-system-packages` still get a working install on the third try. ## Verification - `python -c "import ast; ast.parse(open('routes/cookbook_routes.py').read())"` → `syntax: OK`. - `pip install --user ...` against a fresh venv reproduces the exact error from #354, confirming the original failure mode. - Inside `VIRTUAL_ENV=<path>` the new branch resolves to the plain `pip install` first; with `VIRTUAL_ENV` unset it resolves to the original `--user --break-system-packages` first. Verified by hand-running the generated wrapper snippet in both states. - The dev venv at `venv/` already has `huggingface_hub` installed, so `command -v hf` short-circuits the install branch and the wrapper still produces the same wrapper script as before for happy-path users. (Diff is local to the install fallback; the `hf` command line and the rest of the wrapper are untouched.) - The remote-SSH path is constructed the same way, so the fix is symmetric. The `pip` invocations there use `pip` (not `python3 -m pip`) — that's pre-existing and matches the previous remote-runner behavior; I did not change it. ## Out of scope / things I deliberately did not change - I did not add a Python-side venv detection at the FastAPI route level (`sys.prefix != sys.base_prefix`). The shell-side `${VIRTUAL_ENV:-}` check is enough for the wrapper, and it covers Conda (`CONDA_PREFIX`) too. A server-side check would be redundant. - I did not add a test. The cookbook test suite (`tests/test_shell_routes.py`, `tests/test_*.py`) covers routes that *return* from the server, not the contents of generated shell wrappers. Adding a meaningful test would mean snapshotting the wrapper output against `VIRTUAL_ENV` set/unset, which is a different kind of test than the rest of the suite and probably belongs in its own PR. Happy to follow up if the maintainer wants it. - I did not change the `pip` vs `python3 -m pip` inconsistency between the local wrapper and the remote runner — that's a pre-existing divergence and orthogonal to this bug. - The comment by `abelgarza` on #354 suggesting Docker is a fine workaround, but the manual-install path in the README explicitly supports a venv, and the venv path is broken today. The fix is the right one. ## Notes for the reviewer - Single-file change, +38 / -4. - Touches `routes/cookbook_routes.py` only. - No new dependencies, no schema changes, no JS changes. - The `2>&1 | tail -5` deliberately keeps pip stderr visible in the log — if you'd prefer the old "silent on success, visible on real failure" UX, swap to `[[ $? -eq 0 ]] || 2>&1 | tail -5`, but I think always-visible-on-failure is what users expect.
pewdiepie-archdaemon commented 2026-06-01 11:32:22 +02:00 (Migrated from github.com)

Thanks, this targets the right pain point, but I would not merge it as-is because the new shell chains can mask pip failures.

Checks I ran:

  • git diff origin/main...HEAD --check passed
  • python -m py_compile routes/cookbook_routes.py passed

Blocker: commands like this use a pipeline:

python3 -m pip install -q -U huggingface_hub 2>&1 | tail -5

Without set -o pipefail or an explicit status check, the pipeline status is usually tail's status, not pip's. So if pip fails, tail can still return 0, the || fallback chain stops, and the script proceeds as if install worked. That means the PR may surface the last lines but still skip the intended fallback install paths.

Can you preserve pip's exit status while still showing useful output? For example, capture output to a temp file, print tail -5 on failure, then return/exit with pip's real status; or run under bash with set -o pipefail if these generated scripts are guaranteed to be bash. After that this should be a good fix for #354/#388.

Thanks, this targets the right pain point, but I would not merge it as-is because the new shell chains can mask pip failures. Checks I ran: - git diff origin/main...HEAD --check passed - python -m py_compile routes/cookbook_routes.py passed Blocker: commands like this use a pipeline: `python3 -m pip install -q -U huggingface_hub 2>&1 | tail -5` Without `set -o pipefail` or an explicit status check, the pipeline status is usually `tail`'s status, not pip's. So if pip fails, `tail` can still return 0, the `||` fallback chain stops, and the script proceeds as if install worked. That means the PR may surface the last lines but still skip the intended fallback install paths. Can you preserve pip's exit status while still showing useful output? For example, capture output to a temp file, print `tail -5` on failure, then `return/exit` with pip's real status; or run under bash with `set -o pipefail` if these generated scripts are guaranteed to be bash. After that this should be a good fix for #354/#388.
ErnestHysa commented 2026-06-01 11:51:36 +02:00 (Migrated from github.com)

Good catch, thank you for the thorough review. The | tail -5 masking pip's exit code is a real bug — I should have caught that.

Pushed a fix in the second commit (9ed6e46). Each pip install attempt now runs in its own bash -c subshell that:

  1. Captures all output to a temp file ($(mktemp))
  2. Saves pip's real exit code immediately (_pip_ec=$?)
  3. Prints tail -5 of the captured output (so the tmux log still shows useful diagnostics)
  4. Cleans up the temp file
  5. Exits with pip's actual exit code

This way the || fallback chain sees the real failure and correctly tries the next install path. Applies to both the huggingface_hub and hf_transfer install commands.

Example of what a single attempt now generates:

bash -c '_pip_out=$(mktemp) && python3 -m pip install -q -U huggingface_hub >"$_pip_out" 2>&1; _pip_ec=$?; tail -5 "$_pip_out"; rm -f "$_pip_out"; exit $_pip_ec'

If pip returns 1, the subshell exits 1, and the || chain advances to the next fallback. No more silent masking.

Let me know if you'd prefer set -o pipefail at the script level instead (the shebang is already #!/bin/bash so it's guaranteed available) — happy to switch if that's cleaner for the codebase.

Good catch, thank you for the thorough review. The `| tail -5` masking pip's exit code is a real bug — I should have caught that. Pushed a fix in the second commit (9ed6e46). Each pip install attempt now runs in its own `bash -c` subshell that: 1. Captures all output to a temp file (`$(mktemp)`) 2. Saves pip's real exit code immediately (`_pip_ec=$?`) 3. Prints `tail -5` of the captured output (so the tmux log still shows useful diagnostics) 4. Cleans up the temp file 5. Exits with pip's actual exit code This way the `||` fallback chain sees the real failure and correctly tries the next install path. Applies to both the `huggingface_hub` and `hf_transfer` install commands. Example of what a single attempt now generates: ```bash bash -c '_pip_out=$(mktemp) && python3 -m pip install -q -U huggingface_hub >"$_pip_out" 2>&1; _pip_ec=$?; tail -5 "$_pip_out"; rm -f "$_pip_out"; exit $_pip_ec' ``` If pip returns 1, the subshell exits 1, and the `||` chain advances to the next fallback. No more silent masking. Let me know if you'd prefer `set -o pipefail` at the script level instead (the shebang is already `#!/bin/bash` so it's guaranteed available) — happy to switch if that's cleaner for the codebase.
pewdiepie-archdaemon commented 2026-06-01 15:49:51 +02:00 (Migrated from github.com)

Good diagnosis, but I don't want to merge this exact patch yet. The local wrapper side preserves pip's real exit code, but the remote SSH runner still uses ERROR: unknown command "...", which can mask a failed pip install as success. This needs a small revision so both local and remote install paths preserve failure status while still surfacing the useful log tail.

Good diagnosis, but I don't want to merge this exact patch yet. The local wrapper side preserves pip's real exit code, but the remote SSH runner still uses ERROR: unknown command "...", which can mask a failed pip install as success. This needs a small revision so both local and remote install paths preserve failure status while still surfacing the useful log tail.
ghreprimand commented 2026-06-01 16:06:00 +02:00 (Migrated from github.com)

I think the remaining blocker is that the Linux remote runner still uses pip ... 2>&1 | tail -5 for the remote huggingface_hub / hf_transfer install attempts.

The local wrapper now captures pip output to a temp file and exits with the original pip status, so the fallback chain sees real failures. The remote runner does not appear to use that same status-preserving pattern yet, so a failed pip install can still be treated as success because the pipeline returns the tail status.

A small helper for rendering a status-preserving pip install attempt could make this less error-prone and reusable for both local and remote install lines. A focused helper test would also avoid importing the full route/database stack.

I think the remaining blocker is that the Linux remote runner still uses `pip ... 2>&1 | tail -5` for the remote `huggingface_hub` / `hf_transfer` install attempts. The local wrapper now captures pip output to a temp file and exits with the original pip status, so the fallback chain sees real failures. The remote runner does not appear to use that same status-preserving pattern yet, so a failed pip install can still be treated as success because the pipeline returns the `tail` status. A small helper for rendering a status-preserving pip install attempt could make this less error-prone and reusable for both local and remote install lines. A focused helper test would also avoid importing the full route/database stack.
ErnestHysa commented 2026-06-01 16:23:04 +02:00 (Migrated from github.com)

You're right — I only applied the status-preserving pattern to the local wrapper and left the remote runner with the old | tail -5 pipeline. That was a miss.

Pushed a new commit that addresses both points:

1. Remote runner now uses the same temp-file pattern. The pip ... 2>&1 | tail -5 lines in the Linux/Termux remote runner are gone. Both the huggingface_hub and hf_transfer install attempts now go through bash -c '<temp-file snippet>', which exits with pip's real status.

2. Extracted _pip_install_snippet(pip_args) helper. The temp-file + exit-code pattern was duplicated across 6 inline string literals. It's now a single function at module level that both the local wrapper and remote runner call. Less error-prone, and the || fallback chains on both sides are generated from the same code path.

3. Focused test. tests/test_pip_install_snippet.py exercises the helper in isolation (no FastAPI/DB imports). Nine tests cover: temp file capture, exit code save, tail output, cleanup, pip-args injection, failure propagation (actual bash -c run against a non-existent package), and a negative check that no bare | tail pipeline appears in the output.

Net diff is -51 lines — the helper consolidation actually removes more than it adds.

You're right — I only applied the status-preserving pattern to the local wrapper and left the remote runner with the old `| tail -5` pipeline. That was a miss. Pushed a new commit that addresses both points: **1. Remote runner now uses the same temp-file pattern.** The `pip ... 2>&1 | tail -5` lines in the Linux/Termux remote runner are gone. Both the `huggingface_hub` and `hf_transfer` install attempts now go through `bash -c '<temp-file snippet>'`, which exits with pip's real status. **2. Extracted `_pip_install_snippet(pip_args)` helper.** The temp-file + exit-code pattern was duplicated across 6 inline string literals. It's now a single function at module level that both the local wrapper and remote runner call. Less error-prone, and the `||` fallback chains on both sides are generated from the same code path. **3. Focused test.** `tests/test_pip_install_snippet.py` exercises the helper in isolation (no FastAPI/DB imports). Nine tests cover: temp file capture, exit code save, tail output, cleanup, pip-args injection, failure propagation (actual `bash -c` run against a non-existent package), and a negative check that no bare `| tail` pipeline appears in the output. Net diff is -51 lines — the helper consolidation actually removes more than it adds.
ErnestHysa commented 2026-06-01 16:23:29 +02:00 (Migrated from github.com)

Spot-on diagnosis. The remote runner was still on the | tail -5 pipeline — exactly the same masking bug I fixed on the local side, just missed it on the other code path.

Done in the latest push. Both paths now go through a shared _pip_install_snippet(pip_args) helper that generates the temp-file + exit-code snippet. The helper has its own test file (tests/test_pip_install_snippet.py) that runs the generated shell against a deliberately broken pip install to confirm the exit code actually propagates. No route/DB imports needed.

Spot-on diagnosis. The remote runner was still on the `| tail -5` pipeline — exactly the same masking bug I fixed on the local side, just missed it on the other code path. Done in the latest push. Both paths now go through a shared `_pip_install_snippet(pip_args)` helper that generates the temp-file + exit-code snippet. The helper has its own test file (`tests/test_pip_install_snippet.py`) that runs the generated shell against a deliberately broken pip install to confirm the exit code actually propagates. No route/DB imports needed.
sleepy closed this pull request 2026-06-01 19:45:45 +02:00

Pull request closed

Sign in to join this conversation.
No description provided.