ForgeCode Research & Analysis Folder

Last Updated: April 9, 2026

This folder contains comprehensive research and analysis of the ForgeCode coding harness from antinomyhq.

Folder Structure

forgecode/
├── feedback/
│   ├── frontier/          # Frontier/closed-weight model feedback
│   │   ├── claude-opus-4.6.md
│   │   ├── gpt-5.4.md
│   │   ├── gemini-3.1-pro.md
│   │   ├── privacy-security-concerns.md
│   │   ├── pricing-model.md
│   │   ├── feature-comparison-ecosystem.md
│   │   ├── benchmark-controversy.md
│   │   └── summary-best-practices.md
│   └── localllm/          # Local/open-weight model feedback
│       ├── qwen-3.5.md
│       ├── general-local-models.md
│       ├── tool-calling-reliability.md
│       ├── github-issues-summary.md
│       ├── minimax-glm-deepseek.md
│       └── installation-platform-issues.md
└── README.md              # This file

Key Findings Summary

Strengths

Speed: 3x faster than Claude Code on identical tasks (Opus 4.6)
Multi-model: 300+ models via OpenRouter
Open source: Apache 2.0, auditable
Context efficiency: ~90% reduction vs full-file inclusion

Weaknesses

Privacy concerns: Telemetry collects SSH/git data by default
Feature gaps: No checkpoints, auto-memory, or IDE extensions
Benchmark questions: Self-reported scores differ from independent validation
GPT 5.4 stability: "Borderline unusable" despite 81.8% benchmark score

Critical Issues

#2894: Multiple system messages break Qwen 3.5 and similar models
#1318: Telemetry collection concerns
#2893: Ghostty terminal resize bug

Model Recommendations

Best Overall Experience

Claude Opus 4.6 - Fast, stable, reliable

Best Value

MiniMax M2.1 - 47.9% score at $0.30/$1.20 per million tokens

Avoid

GPT 5.4 through ForgeCode - Tool calling failures
Qwen 3.5 - Broken by #2894 until fixed

Quick Links

Repository: https://github.com/antinomyhq/forgecode
Documentation: https://forgecode.dev/docs/
Discord: https://discord.gg/kRZBPpkgwq
TermBench Leaderboard: https://tbench.ai/leaderboard/terminal-bench/2.0

Feedback Format

Each feedback file includes:

Model used (name, size, provider)
Benchmark results or task performance
Issues encountered
What worked well
Source reference (URL or site)

Last Updated

April 9, 2026

Compiled from:

GitHub issues (48 open, 433 closed)
Reddit discussions (r/ClaudeCode, r/cursor, r/LocalLLaMA)
DEV Community articles
ForgeCode blog posts
Independent benchmark sites (llm-stats.com)
Academic papers (arXiv)

2.8 KiB Raw Permalink Blame History

ForgeCode Research & Analysis Folder

Folder Structure

Key Findings Summary

Strengths

Weaknesses

Critical Issues

Model Recommendations

Best Overall Experience

Best Value

Avoid

Quick Links

Feedback Format

Last Updated

2.8 KiB

Raw Permalink Blame History