mid_model_research/forgecode/README.md

# ForgeCode Research & Analysis Folder

This folder contains comprehensive research and analysis of the **ForgeCode** coding harness from antinomyhq.

---

## Folder Structure

```
forgecode/
├── feedback/
│   ├── frontier/          # Frontier/closed-weight model feedback
│   │   ├── claude-opus-4.6.md
│   │   ├── gpt-5.4.md
│   │   ├── gemini-3.1-pro.md
│   │   ├── privacy-security-concerns.md
│   │   ├── pricing-model.md
│   │   ├── feature-comparison-ecosystem.md
│   │   ├── benchmark-controversy.md
│   │   └── summary-best-practices.md
│   └── localllm/          # Local/open-weight model feedback
│       ├── qwen-3.5.md
│       ├── general-local-models.md
│       ├── tool-calling-reliability.md
│       ├── github-issues-summary.md
│       ├── minimax-glm-deepseek.md
│       └── installation-platform-issues.md
└── README.md              # This file
```

---

## Key Findings Summary

### Strengths
- **Speed:** 3x faster than Claude Code on identical tasks (Opus 4.6)
- **Multi-model:** 300+ models via OpenRouter
- **Open source:** Apache 2.0, auditable
- **Context efficiency:** ~90% reduction vs full-file inclusion

### Weaknesses
- **Privacy concerns:** Telemetry collects SSH/git data by default
- **Feature gaps:** No checkpoints, auto-memory, or IDE extensions
- **Benchmark questions:** Self-reported scores differ from independent validation
- **GPT 5.4 stability:** "Borderline unusable" despite 81.8% benchmark score

### Critical Issues
1. **#2894:** Multiple system messages break Qwen 3.5 and similar models
2. **#1318:** Telemetry collection concerns
3. **#2893:** Ghostty terminal resize bug

---

## Model Recommendations

### Best Overall Experience
- **Claude Opus 4.6** - Fast, stable, reliable

### Best Value
- **MiniMax M2.1** - 47.9% score at $0.30/$1.20 per million tokens

### Avoid
- **GPT 5.4** through ForgeCode - Tool calling failures
- **Qwen 3.5** - Broken by #2894 until fixed

---

## Quick Links

- **Repository:** https://github.com/antinomyhq/forgecode
- **Documentation:** https://forgecode.dev/docs/
- **Discord:** https://discord.gg/kRZBPpkgwq
- **TermBench Leaderboard:** https://tbench.ai/leaderboard/terminal-bench/2.0

---

## Feedback Format

Each feedback file includes:
- Model used (name, size, provider)
- Benchmark results or task performance
- Issues encountered
- What worked well
- Source reference (URL or site)

---

## Last Updated

April 9, 2026

Compiled from:
- GitHub issues (48 open, 433 closed)
- Reddit discussions (r/ClaudeCode, r/cursor, r/LocalLLaMA)
- DEV Community articles
- ForgeCode blog posts
- Independent benchmark sites (llm-stats.com)
- Academic papers (arXiv)