main
Coding Harness Feedback Analysis
Last Updated: April 9, 2026
Research analyzing four coding agent harnesses (opencode, pi, hermes, forgecode) to understand what works best for local/smaller models (7B-27B parameters).
What Was Done
- Repository Analysis: Each harness was analyzed for prompts, tools, parsing, and skills system suitability for local models
- Community Feedback Synthesis: GitHub issues, Reddit discussions, and Discord reports compiled per harness
- Research Integration: Findings cross-referenced with agent systems research (prompting, orchestration, evaluation)
Key Output
conclusion.md — Comprehensive analysis covering:
- What's working well across all four harnesses
- Critical gaps for local model compatibility
- Research-backed recommendations with citations
- Priority fixes (immediate, short-term, medium-term)
Folder Structure
├── conclusion.md # Main findings and recommendations
│
├── AGENTS.md # Original project scope and strategy
│
├── opencode/
│ ├── REPO_FEEDBACK.md # Repository analysis (prompts, tools, parsing)
│ └── feedback/ # Community feedback by model tier
│ ├── frontier/ # GPT-5.4, Claude, Gemini
│ └── localllm/ # Qwen, Gemma, local model issues
│
├── pi/
│ ├── REPO_FEEDBACK.md # Repository analysis
│ └── feedback/
│ ├── frontier/ # Frontier model feedback
│ └── localllm/ # Local model feedback
│
├── hermes/
│ ├── REPO_FEEDBACK.md # Repository analysis
│ └── feedback/
│ ├── frontier/ # Claude, GPT feedback
│ ├── localllm/ # Qwen, Gemma, local setup
│ └── general/ # Bug reports, benchmarks
│
└── forgecode/
├── REPO_FEEDBACK.md # Repository analysis
└── feedback/
├── frontier/ # GPT-5.4, Claude, pricing
└── localllm/ # Qwen, MiniMax, GLM, DeepSeek
Quick Reference
| Harness | Best For | Key Limitation |
|---|---|---|
| pi-mono | Local models (7B+) | Minimal overhead, needs JSON retry layer |
| hermes | Frontier & 27B+ | 14K token overhead, needs tiered toolsets |
| forgecode | Sub-agent workflows | Multiple system messages break Qwen3.5 |
| opencode | Frontier models | Verbose prompts, no JSON repair |
Research Sources
Analysis cross-references findings from:
- SOLVE-Med / MATA (small-model orchestration)
- ATLAS (generate-verify-repair with 14B models)
- StateFlow (FSM-based agent loops)
- JetBrains (observation masking vs summarization)
- Anthropic (Building Effective AI Agents)
- Anthropic (Harness Design for Long-Running Apps)
See ../entropy/Research/md/ for full research notes.
Description
Languages
Markdown
100%