Coding Harness Feedback Analysis

Last Updated: April 9, 2026

Research analyzing four coding agent harnesses (opencode, pi, hermes, forgecode) to understand what works best for local/smaller models (7B-27B parameters).

What Was Done

Repository Analysis: Each harness was analyzed for prompts, tools, parsing, and skills system suitability for local models
Community Feedback Synthesis: GitHub issues, Reddit discussions, and Discord reports compiled per harness
Research Integration: Findings cross-referenced with agent systems research (prompting, orchestration, evaluation)

Key Output

conclusion.md — Comprehensive analysis covering:

What's working well across all four harnesses
Critical gaps for local model compatibility
Research-backed recommendations with citations
Priority fixes (immediate, short-term, medium-term)

Folder Structure

├── conclusion.md          # Main findings and recommendations
│
├── AGENTS.md              # Original project scope and strategy
│
├── opencode/
│   ├── REPO_FEEDBACK.md   # Repository analysis (prompts, tools, parsing)
│   └── feedback/          # Community feedback by model tier
│       ├── frontier/      # GPT-5.4, Claude, Gemini
│       └── localllm/      # Qwen, Gemma, local model issues
│
├── pi/
│   ├── REPO_FEEDBACK.md   # Repository analysis
│   └── feedback/
│       ├── frontier/      # Frontier model feedback
│       └── localllm/      # Local model feedback
│
├── hermes/
│   ├── REPO_FEEDBACK.md   # Repository analysis
│   └── feedback/
│       ├── frontier/      # Claude, GPT feedback
│       ├── localllm/      # Qwen, Gemma, local setup
│       └── general/       # Bug reports, benchmarks
│
└── forgecode/
    ├── REPO_FEEDBACK.md   # Repository analysis
    └── feedback/
        ├── frontier/      # GPT-5.4, Claude, pricing
        └── localllm/      # Qwen, MiniMax, GLM, DeepSeek

Quick Reference

Harness	Best For	Key Limitation
pi-mono	Local models (7B+)	Minimal overhead, needs JSON retry layer
hermes	Frontier & 27B+	14K token overhead, needs tiered toolsets
forgecode	Sub-agent workflows	Multiple system messages break Qwen3.5
opencode	Frontier models	Verbose prompts, no JSON repair

Research Sources

Analysis cross-references findings from:

SOLVE-Med / MATA (small-model orchestration)
ATLAS (generate-verify-repair with 14B models)
StateFlow (FSM-based agent loops)
JetBrains (observation masking vs summarization)
Anthropic (Building Effective AI Agents)
Anthropic (Harness Design for Long-Running Apps)

See Research*.md for full research notes.

2.8 KiB Raw Blame History