# ForgeCode Research & Analysis Folder **Last Updated:** April 9, 2026 This folder contains comprehensive research and analysis of the **ForgeCode** coding harness from antinomyhq. --- ## Folder Structure ``` forgecode/ ├── feedback/ │ ├── frontier/ # Frontier/closed-weight model feedback │ │ ├── claude-opus-4.6.md │ │ ├── gpt-5.4.md │ │ ├── gemini-3.1-pro.md │ │ ├── privacy-security-concerns.md │ │ ├── pricing-model.md │ │ ├── feature-comparison-ecosystem.md │ │ ├── benchmark-controversy.md │ │ └── summary-best-practices.md │ └── localllm/ # Local/open-weight model feedback │ ├── qwen-3.5.md │ ├── general-local-models.md │ ├── tool-calling-reliability.md │ ├── github-issues-summary.md │ ├── minimax-glm-deepseek.md │ └── installation-platform-issues.md └── README.md # This file ``` --- ## Key Findings Summary ### Strengths - **Speed:** 3x faster than Claude Code on identical tasks (Opus 4.6) - **Multi-model:** 300+ models via OpenRouter - **Open source:** Apache 2.0, auditable - **Context efficiency:** ~90% reduction vs full-file inclusion ### Weaknesses - **Privacy concerns:** Telemetry collects SSH/git data by default - **Feature gaps:** No checkpoints, auto-memory, or IDE extensions - **Benchmark questions:** Self-reported scores differ from independent validation - **GPT 5.4 stability:** "Borderline unusable" despite 81.8% benchmark score ### Critical Issues 1. **#2894:** Multiple system messages break Qwen 3.5 and similar models 2. **#1318:** Telemetry collection concerns 3. **#2893:** Ghostty terminal resize bug --- ## Model Recommendations ### Best Overall Experience - **Claude Opus 4.6** - Fast, stable, reliable ### Best Value - **MiniMax M2.1** - 47.9% score at $0.30/$1.20 per million tokens ### Avoid - **GPT 5.4** through ForgeCode - Tool calling failures - **Qwen 3.5** - Broken by #2894 until fixed --- ## Quick Links - **Repository:** https://github.com/antinomyhq/forgecode - **Documentation:** https://forgecode.dev/docs/ - **Discord:** https://discord.gg/kRZBPpkgwq - **TermBench Leaderboard:** https://tbench.ai/leaderboard/terminal-bench/2.0 --- ## Feedback Format Each feedback file includes: - Model used (name, size, provider) - Benchmark results or task performance - Issues encountered - What worked well - Source reference (URL or site) --- ## Last Updated April 9, 2026 Compiled from: - GitHub issues (48 open, 433 closed) - Reddit discussions (r/ClaudeCode, r/cursor, r/LocalLLaMA) - DEV Community articles - ForgeCode blog posts - Independent benchmark sites (llm-stats.com) - Academic papers (arXiv)