Files
harness-feedback/README.md
T

2.8 KiB

Coding Harness Feedback Analysis

Last Updated: April 9, 2026

Research analyzing four coding agent harnesses (opencode, pi, hermes, forgecode) to understand what works best for local/smaller models (7B-27B parameters).

What Was Done

  1. Repository Analysis: Each harness was analyzed for prompts, tools, parsing, and skills system suitability for local models
  2. Community Feedback Synthesis: GitHub issues, Reddit discussions, and Discord reports compiled per harness
  3. Research Integration: Findings cross-referenced with agent systems research (prompting, orchestration, evaluation)

Key Output

conclusion.md — Comprehensive analysis covering:

  • What's working well across all four harnesses
  • Critical gaps for local model compatibility
  • Research-backed recommendations with citations
  • Priority fixes (immediate, short-term, medium-term)

Folder Structure

├── conclusion.md          # Main findings and recommendations
│
├── AGENTS.md              # Original project scope and strategy
│
├── opencode/
│   ├── REPO_FEEDBACK.md   # Repository analysis (prompts, tools, parsing)
│   └── feedback/          # Community feedback by model tier
│       ├── frontier/      # GPT-5.4, Claude, Gemini
│       └── localllm/      # Qwen, Gemma, local model issues
│
├── pi/
│   ├── REPO_FEEDBACK.md   # Repository analysis
│   └── feedback/
│       ├── frontier/      # Frontier model feedback
│       └── localllm/      # Local model feedback
│
├── hermes/
│   ├── REPO_FEEDBACK.md   # Repository analysis
│   └── feedback/
│       ├── frontier/      # Claude, GPT feedback
│       ├── localllm/      # Qwen, Gemma, local setup
│       └── general/       # Bug reports, benchmarks
│
└── forgecode/
    ├── REPO_FEEDBACK.md   # Repository analysis
    └── feedback/
        ├── frontier/      # GPT-5.4, Claude, pricing
        └── localllm/      # Qwen, MiniMax, GLM, DeepSeek

Quick Reference

Harness Best For Key Limitation
pi-mono Local models (7B+) Minimal overhead, needs JSON retry layer
hermes Frontier & 27B+ 14K token overhead, needs tiered toolsets
forgecode Sub-agent workflows Multiple system messages break Qwen3.5
opencode Frontier models Verbose prompts, no JSON repair

Research Sources

Analysis cross-references findings from:

  • SOLVE-Med / MATA (small-model orchestration)
  • ATLAS (generate-verify-repair with 14B models)
  • StateFlow (FSM-based agent loops)
  • JetBrains (observation masking vs summarization)
  • Anthropic (Building Effective AI Agents)
  • Anthropic (Harness Design for Long-Running Apps)

See Research*.md for full research notes.