[feature] Video Analysis side panel — transcribe + find + summarize (like Deep Research panel) #926

Open
opened 2026-06-04 12:13:56 +02:00 by sleepy · 0 comments
Owner

Feature Request

Add a Video Analysis side panel to Odysseus, modeled after the existing Deep Research panel (static/js/research/panel.js + research-overlay). Not a chat tool — a dedicated UI section with its own overlay, progress tracking, and results viewer.

The entire backend pipeline already exists and is production-tested at ~/workspace/vod-pipeline/:

  • vod_pipeline.py (548 lines) — download → preprocess → Parakeet transcribe → LLM analysis
  • discord_bot.py (762 lines) — working Discord bot with /url, /retry, /find, /help
  • ~/workspace/clyde-vods/prompts/PROMPT_5hr.md — analysis prompt (chapters, highlights, games, audio pollution filtering)

~95% of the backend code can be reused. The main work is: frontend panel, backend routes, settings integration.

UI Design (follow Deep Research pattern)

Side Panel Button

Like the Deep Research button in the sidebar, add a Video Analysis button that opens a full-screen overlay (vod-overlay).

Overlay Layout

The overlay should have 3 tabs/modes (like DR has running/completed sections):

  1. Analyze — Full pipeline: paste URL → transcribe + analyze → view results

    • Input: YouTube/Twitch URL field + "Analyze" button
    • Progress bar (reuse the bot's progress parsing logic): Download → Transcribe → Analyze
    • Results: transcript viewer + analysis cards (chapters, highlights, games)
    • Downloads: transcript .txt, analysis .json
  2. Find — Timestamp search in a video

    • Input: URL + search query
    • Auto-transcribes if no existing transcript
    • Returns timestamps + context snippets
    • Reuses /find implementation from discord_bot.py:564+
  3. Summarize — Video summary with key points

    • Input: URL
    • Auto-transcribes if no existing transcript
    • Returns structured summary with timestamps for each key point
    • New (does not exist in Discord bot)

History

Like DR keeps data/deep_research/<id>.json, store analyses in data/video_analysis/<id>.json with:

  • url, video_id, video_info (title, channel, date)
  • transcript_path, analysis_path
  • status (pending/transcribing/analyzing/complete/error)
  • timestamps for each phase

Architecture (follow Deep Research pattern)

Frontend

static/js/vod/
  panel.js      — overlay, tabs, progress, results rendering
  jobs.js       — job tracking, polling, history

Backend Routes

routes/vod_routes.py   — /api/vod/* endpoints

Endpoints:

  • POST /api/vod/analyze — Start analysis (download + transcribe + analyze)
  • POST /api/vod/find — Find in video (auto-transcribe if needed, then search)
  • POST /api/vod/summarize — Summarize video (auto-transcribe if needed, then summarize)
  • GET /api/vod/jobs — List all jobs (history)
  • GET /api/vod/jobs/{id} — Get job status + results
  • GET /api/vod/jobs/{id}/transcript — Stream transcript text
  • DELETE /api/vod/jobs/{id} — Delete job + artifacts

Backend Pipeline

src/vod_pipeline.py         — audio download, preprocess, Parakeet transcription (from vod_pipeline.py)
src/vod_analysis.py         — LLM analysis, find, summarize (from discord_bot.py find logic + new summarize)

Settings Keys

"video_analysis_enabled": True,
"video_analysis_model": "",           # model for analysis/find/summarize
"video_analysis_endpoint_id": "",     # endpoint for the model

Model resolution uses Odysseus' existing endpoint_resolver — NOT hardcoded to DeepSeek. Falls back to default_model / default_endpoint_id if video-specific model not configured.

Data Storage

data/video_analysis/
  <job_id>/
    audio_16k_mono.wav     — kept for re-analysis (delete after configurable TTL)
    <video_id>_transcript.txt
    <video_id>_analysis.json
    job.json               — metadata, status, timestamps

Model Resolution

All LLM calls (analysis, find, summarize) go through Odysseus' endpoint resolution:

from src.settings import get_setting
from src.endpoint_resolver import resolve_endpoint

model = get_setting("video_analysis_model") or get_setting("default_model")
endpoint_id = get_setting("video_analysis_endpoint_id") or get_setting("default_endpoint_id")
url, model, headers = resolve_endpoint(endpoint_id)

Uses src.llm_core.llm_call_async for the actual calls instead of raw urllib.request.

Key Differences from Discord Bot

  1. No Discord dependency — strip all discord.py imports, use FastAPI routes + SSE/WebSocket for progress
  2. Model from settings — not hardcoded DeepSeek
  3. Frontend progress — SSE stream or polling (like DR's progress tracking) instead of Discord message edits
  4. Summarize tool — new, doesn't exist in the bot
  5. Results viewer — rendered in the overlay, not just file uploads

Requirements

Python packages (Parakeet transcription):

nemo_toolkit[asr]   # ~2GB, optional — tool disables gracefully if missing
torchaudio
soundfile

System dependencies:

yt-dlp    # on PATH
ffmpeg    # on PATH

Make NeMo an optional import — the panel shows a setup message if not installed.

Pitfalls (from production experience)

  1. Timestamp format: Must use [HH:MM:SS], not [MM:SS:00]. Minutes > 59 breaks duration estimation.
  2. JSON truncation: LLM max_tokens must be 65536 for long videos. 16K truncates mid-object on 4h+ VODs.
  3. Stereo→mono: NeMo needs (batch, time) shape. Convert with np.mean(data, axis=1).
  4. yt-dlp PATH: When running as subprocess, PATH may not include venv bin. Resolve full path at import time: YT_DLP = shutil.which("yt-dlp") or str(Path(sys.executable).parent / "yt-dlp")
  5. Highlight cap: Append instruction to limit to 50 highlights max.
  6. Audio cleanup: Delete original WAV immediately after transcription (2.4GB+). Keep 16kHz mono for reuse.
  7. Long-running: Transcription takes ~1.5 min per hour of video. Must be async with progress reporting via SSE/polling.
  8. Audio pollution: The PROMPT_5hr.md already handles filtering game voice lines, music lyrics, etc.

Reference Code

Source What to reuse
~/workspace/vod-pipeline/vod_pipeline.py download_audio, preprocess_audio, transcribe_parakeet, build_analysis_prompt, call_deepseek_analysis, estimate_duration, get_video_info
~/workspace/vod-pipeline/discord_bot.py /find command logic (FIND_SYSTEM_PROMPT + transcript search), progress parsing, retry logic
~/workspace/clyde-vods/prompts/PROMPT_5hr.md Full analysis prompt with audio pollution filtering, chapter/highlight/game extraction
static/js/research/panel.js Overlay pattern, progress tracking, job history, section collapse
routes/research_routes.py Route pattern for long-running async jobs with progress
src/research_handler.py Job lifecycle pattern (pending → running → complete/error)
  • vod-pipeline-bot Hermes skill has comprehensive architecture docs for all pitfalls
  • #921 — Settings persistence (should be fixed first)
  • #924 — Subagent tool rewrite (video analysis could eventually use role-based model routing)
## Feature Request Add a **Video Analysis** side panel to Odysseus, modeled after the existing **Deep Research** panel (`static/js/research/panel.js` + `research-overlay`). Not a chat tool — a dedicated UI section with its own overlay, progress tracking, and results viewer. The entire backend pipeline already exists and is production-tested at `~/workspace/vod-pipeline/`: - **`vod_pipeline.py`** (548 lines) — download → preprocess → Parakeet transcribe → LLM analysis - **`discord_bot.py`** (762 lines) — working Discord bot with `/url`, `/retry`, `/find`, `/help` - **`~/workspace/clyde-vods/prompts/PROMPT_5hr.md`** — analysis prompt (chapters, highlights, games, audio pollution filtering) ~95% of the backend code can be reused. The main work is: frontend panel, backend routes, settings integration. ## UI Design (follow Deep Research pattern) ### Side Panel Button Like the Deep Research button in the sidebar, add a **Video Analysis** button that opens a full-screen overlay (`vod-overlay`). ### Overlay Layout The overlay should have **3 tabs/modes** (like DR has running/completed sections): 1. **Analyze** — Full pipeline: paste URL → transcribe + analyze → view results - Input: YouTube/Twitch URL field + "Analyze" button - Progress bar (reuse the bot's progress parsing logic): Download → Transcribe → Analyze - Results: transcript viewer + analysis cards (chapters, highlights, games) - Downloads: transcript .txt, analysis .json 2. **Find** — Timestamp search in a video - Input: URL + search query - Auto-transcribes if no existing transcript - Returns timestamps + context snippets - Reuses `/find` implementation from `discord_bot.py:564+` 3. **Summarize** — Video summary with key points - Input: URL - Auto-transcribes if no existing transcript - Returns structured summary with timestamps for each key point - **New** (does not exist in Discord bot) ### History Like DR keeps `data/deep_research/<id>.json`, store analyses in `data/video_analysis/<id>.json` with: - url, video_id, video_info (title, channel, date) - transcript_path, analysis_path - status (pending/transcribing/analyzing/complete/error) - timestamps for each phase ## Architecture (follow Deep Research pattern) ### Frontend ``` static/js/vod/ panel.js — overlay, tabs, progress, results rendering jobs.js — job tracking, polling, history ``` ### Backend Routes ``` routes/vod_routes.py — /api/vod/* endpoints ``` Endpoints: - `POST /api/vod/analyze` — Start analysis (download + transcribe + analyze) - `POST /api/vod/find` — Find in video (auto-transcribe if needed, then search) - `POST /api/vod/summarize` — Summarize video (auto-transcribe if needed, then summarize) - `GET /api/vod/jobs` — List all jobs (history) - `GET /api/vod/jobs/{id}` — Get job status + results - `GET /api/vod/jobs/{id}/transcript` — Stream transcript text - `DELETE /api/vod/jobs/{id}` — Delete job + artifacts ### Backend Pipeline ``` src/vod_pipeline.py — audio download, preprocess, Parakeet transcription (from vod_pipeline.py) src/vod_analysis.py — LLM analysis, find, summarize (from discord_bot.py find logic + new summarize) ``` ### Settings Keys ```python "video_analysis_enabled": True, "video_analysis_model": "", # model for analysis/find/summarize "video_analysis_endpoint_id": "", # endpoint for the model ``` Model resolution uses Odysseus' existing `endpoint_resolver` — NOT hardcoded to DeepSeek. Falls back to `default_model` / `default_endpoint_id` if video-specific model not configured. ### Data Storage ``` data/video_analysis/ <job_id>/ audio_16k_mono.wav — kept for re-analysis (delete after configurable TTL) <video_id>_transcript.txt <video_id>_analysis.json job.json — metadata, status, timestamps ``` ## Model Resolution All LLM calls (analysis, find, summarize) go through Odysseus' endpoint resolution: ```python from src.settings import get_setting from src.endpoint_resolver import resolve_endpoint model = get_setting("video_analysis_model") or get_setting("default_model") endpoint_id = get_setting("video_analysis_endpoint_id") or get_setting("default_endpoint_id") url, model, headers = resolve_endpoint(endpoint_id) ``` Uses `src.llm_core.llm_call_async` for the actual calls instead of raw `urllib.request`. ## Key Differences from Discord Bot 1. **No Discord dependency** — strip all `discord.py` imports, use FastAPI routes + SSE/WebSocket for progress 2. **Model from settings** — not hardcoded DeepSeek 3. **Frontend progress** — SSE stream or polling (like DR's progress tracking) instead of Discord message edits 4. **Summarize tool** — new, doesn't exist in the bot 5. **Results viewer** — rendered in the overlay, not just file uploads ## Requirements Python packages (Parakeet transcription): ``` nemo_toolkit[asr] # ~2GB, optional — tool disables gracefully if missing torchaudio soundfile ``` System dependencies: ``` yt-dlp # on PATH ffmpeg # on PATH ``` Make NeMo an **optional import** — the panel shows a setup message if not installed. ## Pitfalls (from production experience) 1. **Timestamp format**: Must use `[HH:MM:SS]`, not `[MM:SS:00]`. Minutes > 59 breaks duration estimation. 2. **JSON truncation**: LLM `max_tokens` must be 65536 for long videos. 16K truncates mid-object on 4h+ VODs. 3. **Stereo→mono**: NeMo needs `(batch, time)` shape. Convert with `np.mean(data, axis=1)`. 4. **yt-dlp PATH**: When running as subprocess, PATH may not include venv bin. Resolve full path at import time: `YT_DLP = shutil.which("yt-dlp") or str(Path(sys.executable).parent / "yt-dlp")` 5. **Highlight cap**: Append instruction to limit to 50 highlights max. 6. **Audio cleanup**: Delete original WAV immediately after transcription (2.4GB+). Keep 16kHz mono for reuse. 7. **Long-running**: Transcription takes ~1.5 min per hour of video. Must be async with progress reporting via SSE/polling. 8. **Audio pollution**: The PROMPT_5hr.md already handles filtering game voice lines, music lyrics, etc. ## Reference Code | Source | What to reuse | |--------|--------------| | `~/workspace/vod-pipeline/vod_pipeline.py` | download_audio, preprocess_audio, transcribe_parakeet, build_analysis_prompt, call_deepseek_analysis, estimate_duration, get_video_info | | `~/workspace/vod-pipeline/discord_bot.py` | /find command logic (FIND_SYSTEM_PROMPT + transcript search), progress parsing, retry logic | | `~/workspace/clyde-vods/prompts/PROMPT_5hr.md` | Full analysis prompt with audio pollution filtering, chapter/highlight/game extraction | | `static/js/research/panel.js` | Overlay pattern, progress tracking, job history, section collapse | | `routes/research_routes.py` | Route pattern for long-running async jobs with progress | | `src/research_handler.py` | Job lifecycle pattern (pending → running → complete/error) | ## Related - `vod-pipeline-bot` Hermes skill has comprehensive architecture docs for all pitfalls - #921 — Settings persistence (should be fixed first) - #924 — Subagent tool rewrite (video analysis could eventually use role-based model routing)
sleepy changed title from [feature] VOD/Video analysis tool — transcribe + find + summarize as chat tool to [feature] Video Analysis side panel — transcribe + find + summarize (like Deep Research panel) 2026-06-04 12:18:10 +02:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/odysseus#926
No description provided.