No description
| Filename | Latest commit message | Latest commit date |
|---|---|---|
| README.md | ||
Automated short testing of research papers for the sake of identifying strong signals
Sweep (fast model, 2x per day)
- Read all paper titles (comp sci) from the last 12h
- Choose up to 20 strong signal papers based on the titles
- Fetch the summaries of those 20
- Choose 10 strongest based on title + summary
- Read in full those 10
- Move to 'For Review' list the strongest 5
- Discard the rest
Filter (fast model, 1x per day)
- Read the 10 papers in the 'For Review'
- Choose 5 strongest and add them to 'Potential investigation'
- Move the 5 latter ones to 'Archive'
Select (strong model, 1x every 2 days)
- Read the 'Potential investigation' papers
- Move 2 strongest signals to 'Active'
- Move 5 weakest to 'Archive'
Verify (development model, 1x every 2 days, two instances in parallel)
- Spawn worker in a docker container (3090 / 24gb vram, 50gb storage, access to model bank and dataset bank) with one of the 'Active' papers
- Worker builds a mini-wiki of related research, cited research, online discourse on the topic (1-2h tops)
- Worker starts iterating on applying and testing the paper's proposal
- If paper requires more resources, the worker has a function they can call to ask for more vram/storage
- End criteria are: Out-of-time (48h) or achieved noticeable improvement / verified paper
- Fast model is used to nudge the workers when they stop or ask questions
- Strong model verifies on success if there are any noteworthy problems with the result, if yes, move paper to 'Backlog' together with a result note
Hold (strong model)
- Papers that the strong model considers worth investigating but cannot be scaled down for verification in the standard worker containers
- Strong model writes a brief (3-5 sentences) on why the paper is worth the extra resources despite not being verifiable at small scale
- Paper is queued for manual review with the brief attached. User reads the pitch, not the paper
Report (strong model)
- If any research paper was verified by a worker and looks sound to the strong model, escalate to user's attention
Footnote
- Fast models — DeepSeek V4 Flash / Qwen3.6 35B-A3B
- Strong models — frontiers
- Dev models — the orchestration mix of models I use for development usually (changes often)
- Model bank — a collection of 200M–2B models with different architectures for testing
- Dataset bank — a collection of about 100B tokens worth of different curated datasets for testing
- Paper fetching — arxiv papers are available as HTML (
arxiv.org/html/{paper_id}), no PDF parsing needed for papers submitted after Dec 2023; older papers fall back to ar5iv or a local pdf-to-md script