repo
@ b73fb81579
ForgeCode Research & Analysis Folder
Last Updated: April 9, 2026
This folder contains comprehensive research and analysis of the ForgeCode coding harness from antinomyhq.
Folder Structure
forgecode/
├── feedback/
│ ├── frontier/ # Frontier/closed-weight model feedback
│ │ ├── claude-opus-4.6.md
│ │ ├── gpt-5.4.md
│ │ ├── gemini-3.1-pro.md
│ │ ├── privacy-security-concerns.md
│ │ ├── pricing-model.md
│ │ ├── feature-comparison-ecosystem.md
│ │ ├── benchmark-controversy.md
│ │ └── summary-best-practices.md
│ └── localllm/ # Local/open-weight model feedback
│ ├── qwen-3.5.md
│ ├── general-local-models.md
│ ├── tool-calling-reliability.md
│ ├── github-issues-summary.md
│ ├── minimax-glm-deepseek.md
│ └── installation-platform-issues.md
└── README.md # This file
Key Findings Summary
Strengths
- Speed: 3x faster than Claude Code on identical tasks (Opus 4.6)
- Multi-model: 300+ models via OpenRouter
- Open source: Apache 2.0, auditable
- Context efficiency: ~90% reduction vs full-file inclusion
Weaknesses
- Privacy concerns: Telemetry collects SSH/git data by default
- Feature gaps: No checkpoints, auto-memory, or IDE extensions
- Benchmark questions: Self-reported scores differ from independent validation
- GPT 5.4 stability: "Borderline unusable" despite 81.8% benchmark score
Critical Issues
- #2894: Multiple system messages break Qwen 3.5 and similar models
- #1318: Telemetry collection concerns
- #2893: Ghostty terminal resize bug
Model Recommendations
Best Overall Experience
- Claude Opus 4.6 - Fast, stable, reliable
Best Value
- MiniMax M2.1 - 47.9% score at $0.30/$1.20 per million tokens
Avoid
- GPT 5.4 through ForgeCode - Tool calling failures
- Qwen 3.5 - Broken by #2894 until fixed
Quick Links
- Repository: https://github.com/antinomyhq/forgecode
- Documentation: https://forgecode.dev/docs/
- Discord: https://discord.gg/kRZBPpkgwq
- TermBench Leaderboard: https://tbench.ai/leaderboard/terminal-bench/2.0
Feedback Format
Each feedback file includes:
- Model used (name, size, provider)
- Benchmark results or task performance
- Issues encountered
- What worked well
- Source reference (URL or site)
Last Updated
April 9, 2026
Compiled from:
- GitHub issues (48 open, 433 closed)
- Reddit discussions (r/ClaudeCode, r/cursor, r/LocalLLaMA)
- DEV Community articles
- ForgeCode blog posts
- Independent benchmark sites (llm-stats.com)
- Academic papers (arXiv)