No description
Find a file
Repository files (latest commit first)
Filename Latest commit message Latest commit date
2026-05-26 23:01:50 +02:00
README.md Update README.md 2026-05-26 23:01:50 +02:00

Automated short testing of research papers for the sake of identifying strong signals

Sweep (fast model, 2x per day)

  1. Read all paper titles (comp sci) from the last 12h
  2. Choose up to 20 strong signal papers based on the titles
  3. Fetch the summaries of those 20
  4. Choose 10 strongest based on title + summary
  5. Read in full those 10
  6. Move to 'For Review' list the strongest 5
  7. Discard the rest

Filter (fast model, 1x per day)

  1. Read the 10 papers in the 'For Review'
  2. Choose 5 strongest and add them to 'Potential investigation'
  3. Move the 5 latter ones to 'Archive'

Select (strong model, 1x every 2 days)

  1. Read the 'Potential investigation' papers
  2. Move 2 strongest signals to 'Active'
  3. Move 5 weakest to 'Archive'

Verify (development model, 1x every 2 days, two instances in parallel)

  1. Spawn worker in a docker container (3090 / 24gb vram, 50gb storage, access to model bank and dataset bank) with one of the 'Active' papers
  2. Worker builds a mini-wiki of related research, cited research, online discourse on the topic (1-2h tops)
  3. Worker starts iterating on applying and testing the paper's proposal
    • If paper requires more resources, the worker has a function they can call to ask for more vram/storage
  4. End criteria are: Out-of-time (48h) or achieved noticeable improvement / verified paper
    • Fast model is used to nudge the workers when they stop or ask questions
  5. Strong model verifies on success if there are any noteworthy problems with the result, if yes, move paper to 'Backlog' together with a result note

Hold (strong model)

  1. Papers that the strong model considers worth investigating but cannot be scaled down for verification in the standard worker containers
  2. Strong model writes a brief (3-5 sentences) on why the paper is worth the extra resources despite not being verifiable at small scale
  3. Paper is queued for manual review with the brief attached. User reads the pitch, not the paper

Report (strong model)

  1. If any research paper was verified by a worker and looks sound to the strong model, escalate to user's attention

Footnote

  • Fast models — DeepSeek V4 Flash / Qwen3.6 35B-A3B
  • Strong models — frontiers
  • Dev models — the orchestration mix of models I use for development usually (changes often)
  • Model bank — a collection of 200M2B models with different architectures for testing
  • Dataset bank — a collection of about 100B tokens worth of different curated datasets for testing
  • Paper fetching — arxiv papers are available as HTML (arxiv.org/html/{paper_id}), no PDF parsing needed for papers submitted after Dec 2023; older papers fall back to ar5iv or a local pdf-to-md script