74 lines
2.8 KiB
Markdown
74 lines
2.8 KiB
Markdown
# Coding Harness Feedback Analysis
|
|
|
|
**Last Updated:** April 9, 2026
|
|
|
|
Research analyzing four coding agent harnesses (opencode, pi, hermes, forgecode) to understand what works best for local/smaller models (7B-27B parameters).
|
|
|
|
## What Was Done
|
|
|
|
1. **Repository Analysis**: Each harness was analyzed for prompts, tools, parsing, and skills system suitability for local models
|
|
2. **Community Feedback Synthesis**: GitHub issues, Reddit discussions, and Discord reports compiled per harness
|
|
3. **Research Integration**: Findings cross-referenced with agent systems research (prompting, orchestration, evaluation)
|
|
|
|
## Key Output
|
|
|
|
**`conclusion.md`** — Comprehensive analysis covering:
|
|
- What's working well across all four harnesses
|
|
- Critical gaps for local model compatibility
|
|
- Research-backed recommendations with citations
|
|
- Priority fixes (immediate, short-term, medium-term)
|
|
|
|
## Folder Structure
|
|
|
|
```
|
|
├── conclusion.md # Main findings and recommendations
|
|
│
|
|
├── AGENTS.md # Original project scope and strategy
|
|
│
|
|
├── opencode/
|
|
│ ├── REPO_FEEDBACK.md # Repository analysis (prompts, tools, parsing)
|
|
│ └── feedback/ # Community feedback by model tier
|
|
│ ├── frontier/ # GPT-5.4, Claude, Gemini
|
|
│ └── localllm/ # Qwen, Gemma, local model issues
|
|
│
|
|
├── pi/
|
|
│ ├── REPO_FEEDBACK.md # Repository analysis
|
|
│ └── feedback/
|
|
│ ├── frontier/ # Frontier model feedback
|
|
│ └── localllm/ # Local model feedback
|
|
│
|
|
├── hermes/
|
|
│ ├── REPO_FEEDBACK.md # Repository analysis
|
|
│ └── feedback/
|
|
│ ├── frontier/ # Claude, GPT feedback
|
|
│ ├── localllm/ # Qwen, Gemma, local setup
|
|
│ └── general/ # Bug reports, benchmarks
|
|
│
|
|
└── forgecode/
|
|
├── REPO_FEEDBACK.md # Repository analysis
|
|
└── feedback/
|
|
├── frontier/ # GPT-5.4, Claude, pricing
|
|
└── localllm/ # Qwen, MiniMax, GLM, DeepSeek
|
|
```
|
|
|
|
## Quick Reference
|
|
|
|
| Harness | Best For | Key Limitation |
|
|
|---------|----------|----------------|
|
|
| **pi-mono** | Local models (7B+) | Minimal overhead, needs JSON retry layer |
|
|
| **hermes** | Frontier & 27B+ | 14K token overhead, needs tiered toolsets |
|
|
| **forgecode** | Sub-agent workflows | Multiple system messages break Qwen3.5 |
|
|
| **opencode** | Frontier models | Verbose prompts, no JSON repair |
|
|
|
|
## Research Sources
|
|
|
|
Analysis cross-references findings from:
|
|
- SOLVE-Med / MATA (small-model orchestration)
|
|
- ATLAS (generate-verify-repair with 14B models)
|
|
- StateFlow (FSM-based agent loops)
|
|
- JetBrains (observation masking vs summarization)
|
|
- Anthropic (Building Effective AI Agents)
|
|
- Anthropic (Harness Design for Long-Running Apps)
|
|
|
|
See `../entropy/Research/md/` for full research notes.
|