Files

74 lines
2.8 KiB
Markdown

# Coding Harness Feedback Analysis
**Last Updated:** April 9, 2026
Research analyzing four coding agent harnesses (opencode, pi, hermes, forgecode) to understand what works best for local/smaller models (7B-27B parameters).
## What Was Done
1. **Repository Analysis**: Each harness was analyzed for prompts, tools, parsing, and skills system suitability for local models
2. **Community Feedback Synthesis**: GitHub issues, Reddit discussions, and Discord reports compiled per harness
3. **Research Integration**: Findings cross-referenced with agent systems research (prompting, orchestration, evaluation)
## Key Output
**`conclusion.md`** — Comprehensive analysis covering:
- What's working well across all four harnesses
- Critical gaps for local model compatibility
- Research-backed recommendations with citations
- Priority fixes (immediate, short-term, medium-term)
## Folder Structure
```
├── conclusion.md # Main findings and recommendations
├── AGENTS.md # Original project scope and strategy
├── opencode/
│ ├── REPO_FEEDBACK.md # Repository analysis (prompts, tools, parsing)
│ └── feedback/ # Community feedback by model tier
│ ├── frontier/ # GPT-5.4, Claude, Gemini
│ └── localllm/ # Qwen, Gemma, local model issues
├── pi/
│ ├── REPO_FEEDBACK.md # Repository analysis
│ └── feedback/
│ ├── frontier/ # Frontier model feedback
│ └── localllm/ # Local model feedback
├── hermes/
│ ├── REPO_FEEDBACK.md # Repository analysis
│ └── feedback/
│ ├── frontier/ # Claude, GPT feedback
│ ├── localllm/ # Qwen, Gemma, local setup
│ └── general/ # Bug reports, benchmarks
└── forgecode/
├── REPO_FEEDBACK.md # Repository analysis
└── feedback/
├── frontier/ # GPT-5.4, Claude, pricing
└── localllm/ # Qwen, MiniMax, GLM, DeepSeek
```
## Quick Reference
| Harness | Best For | Key Limitation |
|---------|----------|----------------|
| **pi-mono** | Local models (7B+) | Minimal overhead, needs JSON retry layer |
| **hermes** | Frontier & 27B+ | 14K token overhead, needs tiered toolsets |
| **forgecode** | Sub-agent workflows | Multiple system messages break Qwen3.5 |
| **opencode** | Frontier models | Verbose prompts, no JSON repair |
## Research Sources
Analysis cross-references findings from:
- SOLVE-Med / MATA (small-model orchestration)
- ATLAS (generate-verify-repair with 14B models)
- StateFlow (FSM-based agent loops)
- JetBrains (observation masking vs summarization)
- Anthropic (Building Effective AI Agents)
- Anthropic (Harness Design for Long-Running Apps)
See `../entropy/Research/md/` for full research notes.