[search] content.py diverges significantly between services/search/ and src/search/ — security risk #784
Labels
No labels
area:chat
area:core
area:llm
area:routes
area:tools
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
refactor
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
sleepy/odysseus#784
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
services/search/content.py(360 lines) andsrc/search/content.py(402 lines) have 160 lines of diff, making this the most divergent pair in the duplicated search modules.Key differences:
src/version addsimport copyand usesList[]type hints (older style) vslist[](newer)_is_private_host()has different logic ordering and thesrc/version includes additional host checks (localhost, metadata.google.internal, metadata)_resolve_hostname_ips()uses different iteration patterns_get_public_url()has a different signature (keyword-only args insrc/)src/version has an extra_extract_og_image()functionThis means web content fetching behaves differently depending on which code path is taken.
Risk
Security-sensitive URL validation (SSRF protection) differs between the two copies. The
src/version has stricter checks.Fix
src/intoservices/(canonical location)src/search/duplicate