mid_model_research/README.md

# Coding Harness Feedback Analysis

**Last Updated:** April 9, 2026

Research analyzing four coding agent harnesses (opencode, pi, hermes, forgecode) to understand what works best for local/smaller models (7B-27B parameters).

## What Was Done

1. **Repository Analysis**: Each harness was analyzed for prompts, tools, parsing, and skills system suitability for local models
2. **Community Feedback Synthesis**: GitHub issues, Reddit discussions, and Discord reports compiled per harness
3. **Research Integration**: Findings cross-referenced with agent systems research (prompting, orchestration, evaluation)

## Key Output

**`conclusion.md`** — Comprehensive analysis covering:
- What's working well across all four harnesses
- Critical gaps for local model compatibility
- Research-backed recommendations with citations
- Priority fixes (immediate, short-term, medium-term)

## Folder Structure

```
├── conclusion.md          # Main findings and recommendations
│
├── AGENTS.md              # Original project scope and strategy
│
├── opencode/
│   ├── REPO_FEEDBACK.md   # Repository analysis (prompts, tools, parsing)
│   └── feedback/          # Community feedback by model tier
│       ├── frontier/      # GPT-5.4, Claude, Gemini
│       └── localllm/      # Qwen, Gemma, local model issues
│
├── pi/
│   ├── REPO_FEEDBACK.md   # Repository analysis
│   └── feedback/
│       ├── frontier/      # Frontier model feedback
│       └── localllm/      # Local model feedback
│
├── hermes/
│   ├── REPO_FEEDBACK.md   # Repository analysis
│   └── feedback/
│       ├── frontier/      # Claude, GPT feedback
│       ├── localllm/      # Qwen, Gemma, local setup
│       └── general/       # Bug reports, benchmarks
│
└── forgecode/
    ├── REPO_FEEDBACK.md   # Repository analysis
    └── feedback/
        ├── frontier/      # GPT-5.4, Claude, pricing
        └── localllm/      # Qwen, MiniMax, GLM, DeepSeek
```

## Quick Reference

| Harness | Best For | Key Limitation |
|---------|----------|----------------|
| **pi-mono** | Local models (7B+) | Minimal overhead, needs JSON retry layer |
| **hermes** | Frontier & 27B+ | 14K token overhead, needs tiered toolsets |
| **forgecode** | Sub-agent workflows | Multiple system messages break Qwen3.5 |
| **opencode** | Frontier models | Verbose prompts, no JSON repair |

## Research Sources

Analysis cross-references findings from:
- SOLVE-Med / MATA (small-model orchestration)
- ATLAS (generate-verify-repair with 14B models)
- StateFlow (FSM-based agent loops)
- JetBrains (observation masking vs summarization)
- Anthropic (Building Effective AI Agents)
- Anthropic (Harness Design for Long-Running Apps)

See `../entropy/Research/md/` for full research notes.