[search] content.py diverges significantly between services/search/ and src/search/ — security risk #784

Closed
opened 2026-06-03 00:32:14 +02:00 by sleepy · 0 comments
Owner

Problem

services/search/content.py (360 lines) and src/search/content.py (402 lines) have 160 lines of diff, making this the most divergent pair in the duplicated search modules.

Key differences:

  • src/ version adds import copy and uses List[] type hints (older style) vs list[] (newer)
  • _is_private_host() has different logic ordering and the src/ version includes additional host checks (localhost, metadata.google.internal, metadata)
  • _resolve_hostname_ips() uses different iteration patterns
  • _get_public_url() has a different signature (keyword-only args in src/)
  • src/ version has an extra _extract_og_image() function

This means web content fetching behaves differently depending on which code path is taken.

Risk

Security-sensitive URL validation (SSRF protection) differs between the two copies. The src/ version has stricter checks.

Fix

  1. Merge the security improvements from src/ into services/ (canonical location)
  2. Delete the src/search/ duplicate
  3. See #771 for the broader dedup task
## Problem `services/search/content.py` (360 lines) and `src/search/content.py` (402 lines) have 160 lines of diff, making this the most divergent pair in the duplicated search modules. Key differences: - `src/` version adds `import copy` and uses `List[]` type hints (older style) vs `list[]` (newer) - `_is_private_host()` has different logic ordering and the `src/` version includes additional host checks (localhost, metadata.google.internal, metadata) - `_resolve_hostname_ips()` uses different iteration patterns - `_get_public_url()` has a different signature (keyword-only args in `src/`) - `src/` version has an extra `_extract_og_image()` function This means web content fetching behaves differently depending on which code path is taken. ## Risk Security-sensitive URL validation (SSRF protection) differs between the two copies. The `src/` version has stricter checks. ## Fix 1. Merge the security improvements from `src/` into `services/` (canonical location) 2. Delete the `src/search/` duplicate 3. See #771 for the broader dedup task
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/odysseus#784
No description provided.