Add REPO_FEEDBACK.md files for opencode, hermes, forgecode, and pi-mono harnesses

2026-04-09 17:14:27 +02:00
parent e1781947f4
commit a794d9bddf
4 changed files with 1339 additions and 0 deletions
@@ -0,0 +1,350 @@
+# OpenCode Repository Analysis: Local Model Compatibility
+
+**Date:** April 9, 2026  
+**Analysis Focus:** Prompts, Tools, Parsing, and Skills for Local/Smaller Models  
+**Source:** Code analysis of `opencode-ai/opencode` + Community feedback from GitHub, Reddit, Discord
+
+---
+
+## Executive Summary
+
+OpenCode is a Go-based coding agent with heavy optimization for frontier models (Claude, GPT-4o). The codebase shows **strong architectural decisions** for general use but has **specific pain points for local models** that manifest as JSON parsing errors, tool calling failures, and context truncation issues.
+
+**Verdict:** Works well with local models 27B+ (Qwen3.5 27B, Gemma 4 26B) with configuration adjustments. Smaller models (7B-14B) struggle due to prompt complexity and tool count.
+
+---
+
+## 1. PROMPTS Analysis
+
+### 1.1 Prompt Structure
+
+OpenCode uses **provider-specific prompts** with significant differences between Anthropic and OpenAI formats:
+
+| File | Purpose | Lines | Strength for Local Models |
+|------|---------|-------|---------------------------|
+| `prompt/coder.go` | Main coding agent | ~220 | ⚠️ VERBOSE - Complex instructions |
+| `prompt/task.go` | Sub-agent (search) | ~17 | ✅ GOOD - Minimal, focused |
+| `prompt/summarizer.go` | Session summary | ~16 | ✅ GOOD - Simple directive |
+| `prompt/title.go` | Session titling | ~13 | ✅ GOOD - Single task |
+
+### 1.2 The Coder Prompt (Critical Analysis)
+
+**Location:** `internal/llm/prompt/coder.go`
+
+The `baseAnthropicCoderPrompt` is **excessively verbose** (~170 lines of instructions). Key sections:
+
+```go
+// Sections that add token overhead:
+- Tone and style (lines 86-93): ~400 tokens of verbosity constraints
+- Examples section (lines 94-135): Multiple <example> blocks
+- Proactiveness guidelines (lines 137-142)
+- Following conventions (lines 144-149)
+- Code style rules (lines 151-152)
+- Doing tasks workflow (lines 155-162)
+- Tool usage policy (lines 163-166)
+```
+
+**Strong Conclusion (Verified):** 
+- The prompt is designed for models with strong instruction-following (Claude 3.5+, GPT-4o)
+- Local models 14B and smaller struggle to retain all constraints
+- Community feedback confirms: "Qwen 3 14b fails" while "Qwen 3.5 27b works well"
+
+**Weak Conclusion (Inference):**
+- The verbosity may cause "instruction dilution" where smaller models fixate on early/late instructions and miss middle constraints
+- The example-heavy format (6+ examples) may be over-optimizing for frontier models
+
+### 1.3 What Works Well
+
+✅ **Provider-aware prompting** - Different prompts for Anthropic vs OpenAI endpoints  
+✅ **Environment injection** - Dynamic context (working dir, git status, platform, date)  
+✅ **Project-specific context** - Auto-loading from `OpenCode.md` or configured paths  
+✅ **LSP integration hints** - Conditional diagnostics info only when LSP available  
+
+### 1.4 Problems for Local Models
+
+❌ **Excessive constraints** - "You MUST..." appears 8+ times, creating conflicting priorities  
+❌ **Nested conditionals** - "If X then Y unless Z in which case..." structure  
+❌ **Implicit dependencies** - Assumes model can track multiple tool calls across turns  
+
+### 1.5 Community Evidence
+
+> "Local models are more for vibe coding. Not really set for agentic coding. Unless you can host minimax2.5 to actually be worthwhile." — Reddit r/opencodeCLI
+
+> "Qwen 3 14b - fails with hallucinations" vs "Qwen 3.5 27b Q3_XXS - 5.0% migration error, clear winner for local use" — Rost Glukhov benchmark
+
+---
+
+## 2. TOOLS Analysis
+
+### 2.1 Tool Inventory
+
+**Coder Agent Tools:** 11 core tools
+
+| Tool | Description Length | Params | Risk for Local Models |
+|------|-------------------|--------|----------------------|
+| `bash` | ~200 lines | 2 | ⚠️ HIGH - Complex bash description with git/PR instructions |
+| `edit` | ~90 lines | 3 | ⚠️ MEDIUM - Requires precise string matching |
+| `write` | ~60 lines | 2 | ✅ LOW - Straightforward |
+| `view` | ~70 lines | 3 | ✅ LOW - Well-documented |
+| `glob` | ~40 lines | 1 | ✅ LOW - Simple |
+| `grep` | ~80 lines | 4 | ⚠️ MEDIUM - Regex/literal_text nuance |
+| `ls` | ~30 lines | 2 | ✅ LOW - Simple |
+| `fetch` | ~40 lines | 1 | ✅ LOW - Simple |
+| `patch` | ~50 lines | 2 | ⚠️ MEDIUM - Requires understanding diff format |
+| `sourcegraph` | ~30 lines | 1 | ✅ LOW - Simple |
+| `diagnostics` | ~20 lines | 1 | ✅ LOW - Simple |
+| `agent` | ~40 lines | 1 | ⚠️ MEDIUM - Meta-cognitive (sub-agent) |
+
+### 2.2 Tool Description Problems
+
+**CRITICAL ISSUE: Bash Tool Description**
+
+Location: `internal/llm/tools/bash.go` lines 57-203
+
+The bash tool description is **excessively long** (~3500 characters) and includes:
+- Directory verification steps
+- Security check procedures  
+- Command execution flow
+- Output processing rules
+- Git commit workflow (lines 97-151)
+- PR creation workflow (lines 153-199)
+
+**Strong Conclusion (Verified):**
+- Community reports "invalid tool call message with wrong tool name" errors
+- GitHub Issue #13982: GLM-5 "screwing up the JSON parsing" specifically on read tool
+
+**Weak Conclusion (Inference):**
+- The tool descriptions may exceed effective context window for 8K-16K models when combined with prompts
+- Local models may "lose track" of which tool they're calling due to description overload
+
+### 2.3 Tool Calling Issues (Community Verified)
+
+GitHub Issue #4428 (36 comments): "Why is opencode not working with local llms via Ollama?"
+
+> "After many issues with Ollama (mostly that all models default to a very small context window and you have to modify them or find versions with bigger context window settings, and tool call formatting issues), after installing LM Studio I was able to consistently use qwen/qwen3-30b-a3b-2507 with tools"
+
+GitHub Issue #13982: "[bug] GLM 5 keeps screwing up the json parsing of read tool calling"
+
+> "The AI keeps screwing up the JSON formatting for the tool calling. Sometimes I even get 'Method Not Allowed' errors that stops the build dead in the tracks."
+
+---
+
+## 3. PARSING Analysis
+
+### 3.1 Tool Call Parsing Strategy
+
+**Location:** `internal/llm/provider/openai.go`
+
+OpenCode relies on **native function calling** via the OpenAI SDK:
+
+```go
+func (o *openaiClient) toolCalls(completion openai.ChatCompletion) []message.ToolCall {
+    // Extracts tool calls from API response
+    // Assumes provider returns well-formed JSON
+}
+```
+
+**Strong Conclusion (Verified):**
+- Uses standard OpenAI function calling format (works with llama.cpp, vLLM, Ollama)
+- No custom JSON parsing for tool arguments (relies on SDK/provider)
+
+### 3.2 The Problem: Local Model Output
+
+**Weak Conclusion (Inference from patterns):**
+
+Local models often produce:
+1. **Malformed JSON** - Trailing commas, unescaped quotes, missing braces
+2. **Partial tool calls** - Starting JSON but not completing before max_tokens
+3. **Invalid tool names** - Hallucinating tools that don't exist
+4. **Parameter type mismatches** - Sending strings where numbers expected
+
+**The codebase has NO resilience for:**
+- JSON repair/relaxation
+- Partial tool call streaming recovery
+- Tool name fuzzy matching
+- Parameter coercion
+
+### 3.3 Context Window Truncation
+
+**Strong Conclusion (Verified):**
+
+GitHub Issue #1212: "Fetched documentation exceeds context window limit"
+
+> "When opencode pulls documentation from websites, the resulting response can sometimes exceed the context length of the current model in use (currently Claude Sonnet 4 for me). It's impossible to continue this session in this case."
+
+**Config gap:** `internal/llm/models/local.go` sets:
+```go
+ContextWindow: cmp.Or(model.LoadedContextLength, 4096),  // Falls back to 4K!
+```
+
+Community fix (Medium article): Must increase Ollama context from 4K to 32K for reasonable performance.
+
+---
+
+## 4. SKILLS / SUB-AGENTS Analysis
+
+### 4.1 Agent Tool Architecture
+
+**Location:** `internal/llm/agent/agent-tool.go`
+
+```go
+const AgentToolName = "agent"
+
+// Description emphasizes:
+// - Parallel execution (good)
+// - Stateless operation
+// - Read-only (no bash/edit/write)
+```
+
+**Strong Conclusion (Verified):**
+- Sub-agents use **TaskPrompt** (minimal) vs **CoderPrompt** (verbose)
+- Task agent only gets: Glob, Grep, LS, Sourcegraph, View
+- This is actually **good design** - search tasks don't need editing tools
+
+### 4.2 KV Cache Invalidation Issue
+
+**Strong Conclusion (Verified):**
+
+Reddit r/LocalLLaMA:
+
+> "I tried opencode, it also works fine with qwen models but the kv cache was invalidated when working with gpt 120B model."
+
+> "When a new sub agent is spun, the kv cache from parent is not reused so for the sub agent model processed the whole prompt again."
+
+This is an **architectural limitation** - each agent spawns a new session/context.
+
+### 4.3 Sub-Agent Loop Risk
+
+**Weak Conclusion (Inference):**
+
+The agent tool description says:
+> "The agent's outputs should generally be trusted"
+
+For local models, this trust may be misplaced:
+- Sub-agent may return incomplete search results
+- No verification loop in parent agent
+- Can lead to cascading errors
+
+---
+
+## 5. LOCAL MODEL CONFIGURATION
+
+### 5.1 Auto-Discovery (Good)
+
+**Location:** `internal/llm/models/local.go`
+
+```go
+// Automatically discovers models from:
+// - v1/models endpoint (OpenAI compatible)
+// - api/v0/models endpoint (LM Studio)
+// Sets defaults for all agents
+```
+
+✅ **Works well** - No manual model registration needed for local endpoints
+
+### 5.2 Context Window Defaults (Bad)
+
+```go
+ContextWindow: cmp.Or(model.LoadedContextLength, 4096),
+```
+
+❌ **4K fallback is too small** for the verbose prompts + tool descriptions
+
+Community workaround:
+```json
+// ~/.config/opencode/opencode.json
+{
+  "provider": {
+    "ollama": {
+      "models": {
+        "qwen3:32b": {
+          "contextLength": 32768  // Manually override
+        }
+      }
+    }
+  }
+}
+```
+
+---
+
+## 6. RECOMMENDATIONS
+
+### 6.1 Strong Recommendations (Based on Verified Feedback)
+
+1. **Use 27B+ models minimum** for reliable tool calling
+   - Qwen 3.5 27B (Q3_XXS or Q4_K_XL)
+   - Gemma 4 26B (IQ4_XS)
+   - Avoid 14B and smaller for complex tasks
+
+2. **Set context window to 32K minimum** for local models
+   - Default 4K is insufficient for OpenCode's verbose prompts
+
+3. **Use LM Studio or llama.cpp over Ollama** for better tool calling
+   - Community reports more consistent results
+   - May relate to chat template handling
+
+4. **Correct chat templates required**
+   - Default Qwen3.5 template causes 500 errors
+   - Use corrected template from community gist
+
+### 6.2 Weak Recommendations (Inferred from Analysis)
+
+1. **Prompt compression** would benefit local models:
+   - Remove redundant examples from CoderPrompt
+   - Shorten tool descriptions (especially bash)
+   - Consider "instruction hierarchy" formatting
+
+2. **Tool description tiering**:
+   - "Essential" tools for small models
+   - "Extended" tools for large models
+
+3. **JSON resilience layer**:
+   - Partial JSON repair
+   - Tool name fuzzy matching
+   - Parameter type coercion
+
+### 6.3 Architecture Observations
+
+**Good for Local Models:**
+- Stateless sub-agents prevent context overflow
+- Minimal TaskPrompt for search operations
+- Auto-discovery of local endpoints
+- Provider abstraction allows local/remote mixing
+
+**Challenging for Local Models:**
+- No prompt compression/tiering
+- No tool subset selection
+- No JSON repair for malformed calls
+- No KV cache sharing between agents
+
+---
+
+## 7. BENCHMARK DATA
+
+From Rost Glukhov's testing (March 2026):
+
+| Model | IndexNow Task | Migration Error | Speed |
+|-------|--------------|-----------------|-------|
+| Qwen 3.5 27b Q3_XXS | ✅ Pass | 5.0% | 34 tok/s |
+| Gemma 4 26B IQ4_XS | ✅ Pass | 6.2% | ~30 tok/s |
+| Qwen 3 14b | ❌ Fail | — | — |
+| GPT-OSS 20b | ❌ Fail | — | stalls |
+
+**Threshold appears to be ~24B parameters** for reliable OpenCode operation.
+
+---
+
+## 8. SOURCE REFERENCES
+
+- GitHub Issue #4428: Local LLM connection issues (36 comments)
+- GitHub Issue #13982: GLM-5 JSON parsing failures
+- GitHub Issue #1212: Context window overflow
+- Reddit r/opencodeCLI: Local model recommendations
+- Reddit r/LocalLLaMA: KV cache invalidation discussion
+- Aayush Garg blog: Qwen3.5 + llama.cpp + OpenCode setup
+- Rost Glukhov benchmark: Local LLM comparison
+
+---
+
+*Analysis conducted by examining opencode repository source code and synthesizing community feedback from multiple sources. Strong conclusions are backed by multiple verified reports; weak conclusions are reasoned inferences requiring further validation.*