Add REPO_FEEDBACK.md files for opencode, hermes, forgecode, and pi-mono harnesses
This commit is contained in:
@@ -0,0 +1,350 @@
|
||||
# OpenCode Repository Analysis: Local Model Compatibility
|
||||
|
||||
**Date:** April 9, 2026
|
||||
**Analysis Focus:** Prompts, Tools, Parsing, and Skills for Local/Smaller Models
|
||||
**Source:** Code analysis of `opencode-ai/opencode` + Community feedback from GitHub, Reddit, Discord
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
OpenCode is a Go-based coding agent with heavy optimization for frontier models (Claude, GPT-4o). The codebase shows **strong architectural decisions** for general use but has **specific pain points for local models** that manifest as JSON parsing errors, tool calling failures, and context truncation issues.
|
||||
|
||||
**Verdict:** Works well with local models 27B+ (Qwen3.5 27B, Gemma 4 26B) with configuration adjustments. Smaller models (7B-14B) struggle due to prompt complexity and tool count.
|
||||
|
||||
---
|
||||
|
||||
## 1. PROMPTS Analysis
|
||||
|
||||
### 1.1 Prompt Structure
|
||||
|
||||
OpenCode uses **provider-specific prompts** with significant differences between Anthropic and OpenAI formats:
|
||||
|
||||
| File | Purpose | Lines | Strength for Local Models |
|
||||
|------|---------|-------|---------------------------|
|
||||
| `prompt/coder.go` | Main coding agent | ~220 | ⚠️ VERBOSE - Complex instructions |
|
||||
| `prompt/task.go` | Sub-agent (search) | ~17 | ✅ GOOD - Minimal, focused |
|
||||
| `prompt/summarizer.go` | Session summary | ~16 | ✅ GOOD - Simple directive |
|
||||
| `prompt/title.go` | Session titling | ~13 | ✅ GOOD - Single task |
|
||||
|
||||
### 1.2 The Coder Prompt (Critical Analysis)
|
||||
|
||||
**Location:** `internal/llm/prompt/coder.go`
|
||||
|
||||
The `baseAnthropicCoderPrompt` is **excessively verbose** (~170 lines of instructions). Key sections:
|
||||
|
||||
```go
|
||||
// Sections that add token overhead:
|
||||
- Tone and style (lines 86-93): ~400 tokens of verbosity constraints
|
||||
- Examples section (lines 94-135): Multiple <example> blocks
|
||||
- Proactiveness guidelines (lines 137-142)
|
||||
- Following conventions (lines 144-149)
|
||||
- Code style rules (lines 151-152)
|
||||
- Doing tasks workflow (lines 155-162)
|
||||
- Tool usage policy (lines 163-166)
|
||||
```
|
||||
|
||||
**Strong Conclusion (Verified):**
|
||||
- The prompt is designed for models with strong instruction-following (Claude 3.5+, GPT-4o)
|
||||
- Local models 14B and smaller struggle to retain all constraints
|
||||
- Community feedback confirms: "Qwen 3 14b fails" while "Qwen 3.5 27b works well"
|
||||
|
||||
**Weak Conclusion (Inference):**
|
||||
- The verbosity may cause "instruction dilution" where smaller models fixate on early/late instructions and miss middle constraints
|
||||
- The example-heavy format (6+ examples) may be over-optimizing for frontier models
|
||||
|
||||
### 1.3 What Works Well
|
||||
|
||||
✅ **Provider-aware prompting** - Different prompts for Anthropic vs OpenAI endpoints
|
||||
✅ **Environment injection** - Dynamic context (working dir, git status, platform, date)
|
||||
✅ **Project-specific context** - Auto-loading from `OpenCode.md` or configured paths
|
||||
✅ **LSP integration hints** - Conditional diagnostics info only when LSP available
|
||||
|
||||
### 1.4 Problems for Local Models
|
||||
|
||||
❌ **Excessive constraints** - "You MUST..." appears 8+ times, creating conflicting priorities
|
||||
❌ **Nested conditionals** - "If X then Y unless Z in which case..." structure
|
||||
❌ **Implicit dependencies** - Assumes model can track multiple tool calls across turns
|
||||
|
||||
### 1.5 Community Evidence
|
||||
|
||||
> "Local models are more for vibe coding. Not really set for agentic coding. Unless you can host minimax2.5 to actually be worthwhile." — Reddit r/opencodeCLI
|
||||
|
||||
> "Qwen 3 14b - fails with hallucinations" vs "Qwen 3.5 27b Q3_XXS - 5.0% migration error, clear winner for local use" — Rost Glukhov benchmark
|
||||
|
||||
---
|
||||
|
||||
## 2. TOOLS Analysis
|
||||
|
||||
### 2.1 Tool Inventory
|
||||
|
||||
**Coder Agent Tools:** 11 core tools
|
||||
|
||||
| Tool | Description Length | Params | Risk for Local Models |
|
||||
|------|-------------------|--------|----------------------|
|
||||
| `bash` | ~200 lines | 2 | ⚠️ HIGH - Complex bash description with git/PR instructions |
|
||||
| `edit` | ~90 lines | 3 | ⚠️ MEDIUM - Requires precise string matching |
|
||||
| `write` | ~60 lines | 2 | ✅ LOW - Straightforward |
|
||||
| `view` | ~70 lines | 3 | ✅ LOW - Well-documented |
|
||||
| `glob` | ~40 lines | 1 | ✅ LOW - Simple |
|
||||
| `grep` | ~80 lines | 4 | ⚠️ MEDIUM - Regex/literal_text nuance |
|
||||
| `ls` | ~30 lines | 2 | ✅ LOW - Simple |
|
||||
| `fetch` | ~40 lines | 1 | ✅ LOW - Simple |
|
||||
| `patch` | ~50 lines | 2 | ⚠️ MEDIUM - Requires understanding diff format |
|
||||
| `sourcegraph` | ~30 lines | 1 | ✅ LOW - Simple |
|
||||
| `diagnostics` | ~20 lines | 1 | ✅ LOW - Simple |
|
||||
| `agent` | ~40 lines | 1 | ⚠️ MEDIUM - Meta-cognitive (sub-agent) |
|
||||
|
||||
### 2.2 Tool Description Problems
|
||||
|
||||
**CRITICAL ISSUE: Bash Tool Description**
|
||||
|
||||
Location: `internal/llm/tools/bash.go` lines 57-203
|
||||
|
||||
The bash tool description is **excessively long** (~3500 characters) and includes:
|
||||
- Directory verification steps
|
||||
- Security check procedures
|
||||
- Command execution flow
|
||||
- Output processing rules
|
||||
- Git commit workflow (lines 97-151)
|
||||
- PR creation workflow (lines 153-199)
|
||||
|
||||
**Strong Conclusion (Verified):**
|
||||
- Community reports "invalid tool call message with wrong tool name" errors
|
||||
- GitHub Issue #13982: GLM-5 "screwing up the JSON parsing" specifically on read tool
|
||||
|
||||
**Weak Conclusion (Inference):**
|
||||
- The tool descriptions may exceed effective context window for 8K-16K models when combined with prompts
|
||||
- Local models may "lose track" of which tool they're calling due to description overload
|
||||
|
||||
### 2.3 Tool Calling Issues (Community Verified)
|
||||
|
||||
GitHub Issue #4428 (36 comments): "Why is opencode not working with local llms via Ollama?"
|
||||
|
||||
> "After many issues with Ollama (mostly that all models default to a very small context window and you have to modify them or find versions with bigger context window settings, and tool call formatting issues), after installing LM Studio I was able to consistently use qwen/qwen3-30b-a3b-2507 with tools"
|
||||
|
||||
GitHub Issue #13982: "[bug] GLM 5 keeps screwing up the json parsing of read tool calling"
|
||||
|
||||
> "The AI keeps screwing up the JSON formatting for the tool calling. Sometimes I even get 'Method Not Allowed' errors that stops the build dead in the tracks."
|
||||
|
||||
---
|
||||
|
||||
## 3. PARSING Analysis
|
||||
|
||||
### 3.1 Tool Call Parsing Strategy
|
||||
|
||||
**Location:** `internal/llm/provider/openai.go`
|
||||
|
||||
OpenCode relies on **native function calling** via the OpenAI SDK:
|
||||
|
||||
```go
|
||||
func (o *openaiClient) toolCalls(completion openai.ChatCompletion) []message.ToolCall {
|
||||
// Extracts tool calls from API response
|
||||
// Assumes provider returns well-formed JSON
|
||||
}
|
||||
```
|
||||
|
||||
**Strong Conclusion (Verified):**
|
||||
- Uses standard OpenAI function calling format (works with llama.cpp, vLLM, Ollama)
|
||||
- No custom JSON parsing for tool arguments (relies on SDK/provider)
|
||||
|
||||
### 3.2 The Problem: Local Model Output
|
||||
|
||||
**Weak Conclusion (Inference from patterns):**
|
||||
|
||||
Local models often produce:
|
||||
1. **Malformed JSON** - Trailing commas, unescaped quotes, missing braces
|
||||
2. **Partial tool calls** - Starting JSON but not completing before max_tokens
|
||||
3. **Invalid tool names** - Hallucinating tools that don't exist
|
||||
4. **Parameter type mismatches** - Sending strings where numbers expected
|
||||
|
||||
**The codebase has NO resilience for:**
|
||||
- JSON repair/relaxation
|
||||
- Partial tool call streaming recovery
|
||||
- Tool name fuzzy matching
|
||||
- Parameter coercion
|
||||
|
||||
### 3.3 Context Window Truncation
|
||||
|
||||
**Strong Conclusion (Verified):**
|
||||
|
||||
GitHub Issue #1212: "Fetched documentation exceeds context window limit"
|
||||
|
||||
> "When opencode pulls documentation from websites, the resulting response can sometimes exceed the context length of the current model in use (currently Claude Sonnet 4 for me). It's impossible to continue this session in this case."
|
||||
|
||||
**Config gap:** `internal/llm/models/local.go` sets:
|
||||
```go
|
||||
ContextWindow: cmp.Or(model.LoadedContextLength, 4096), // Falls back to 4K!
|
||||
```
|
||||
|
||||
Community fix (Medium article): Must increase Ollama context from 4K to 32K for reasonable performance.
|
||||
|
||||
---
|
||||
|
||||
## 4. SKILLS / SUB-AGENTS Analysis
|
||||
|
||||
### 4.1 Agent Tool Architecture
|
||||
|
||||
**Location:** `internal/llm/agent/agent-tool.go`
|
||||
|
||||
```go
|
||||
const AgentToolName = "agent"
|
||||
|
||||
// Description emphasizes:
|
||||
// - Parallel execution (good)
|
||||
// - Stateless operation
|
||||
// - Read-only (no bash/edit/write)
|
||||
```
|
||||
|
||||
**Strong Conclusion (Verified):**
|
||||
- Sub-agents use **TaskPrompt** (minimal) vs **CoderPrompt** (verbose)
|
||||
- Task agent only gets: Glob, Grep, LS, Sourcegraph, View
|
||||
- This is actually **good design** - search tasks don't need editing tools
|
||||
|
||||
### 4.2 KV Cache Invalidation Issue
|
||||
|
||||
**Strong Conclusion (Verified):**
|
||||
|
||||
Reddit r/LocalLLaMA:
|
||||
|
||||
> "I tried opencode, it also works fine with qwen models but the kv cache was invalidated when working with gpt 120B model."
|
||||
|
||||
> "When a new sub agent is spun, the kv cache from parent is not reused so for the sub agent model processed the whole prompt again."
|
||||
|
||||
This is an **architectural limitation** - each agent spawns a new session/context.
|
||||
|
||||
### 4.3 Sub-Agent Loop Risk
|
||||
|
||||
**Weak Conclusion (Inference):**
|
||||
|
||||
The agent tool description says:
|
||||
> "The agent's outputs should generally be trusted"
|
||||
|
||||
For local models, this trust may be misplaced:
|
||||
- Sub-agent may return incomplete search results
|
||||
- No verification loop in parent agent
|
||||
- Can lead to cascading errors
|
||||
|
||||
---
|
||||
|
||||
## 5. LOCAL MODEL CONFIGURATION
|
||||
|
||||
### 5.1 Auto-Discovery (Good)
|
||||
|
||||
**Location:** `internal/llm/models/local.go`
|
||||
|
||||
```go
|
||||
// Automatically discovers models from:
|
||||
// - v1/models endpoint (OpenAI compatible)
|
||||
// - api/v0/models endpoint (LM Studio)
|
||||
// Sets defaults for all agents
|
||||
```
|
||||
|
||||
✅ **Works well** - No manual model registration needed for local endpoints
|
||||
|
||||
### 5.2 Context Window Defaults (Bad)
|
||||
|
||||
```go
|
||||
ContextWindow: cmp.Or(model.LoadedContextLength, 4096),
|
||||
```
|
||||
|
||||
❌ **4K fallback is too small** for the verbose prompts + tool descriptions
|
||||
|
||||
Community workaround:
|
||||
```json
|
||||
// ~/.config/opencode/opencode.json
|
||||
{
|
||||
"provider": {
|
||||
"ollama": {
|
||||
"models": {
|
||||
"qwen3:32b": {
|
||||
"contextLength": 32768 // Manually override
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. RECOMMENDATIONS
|
||||
|
||||
### 6.1 Strong Recommendations (Based on Verified Feedback)
|
||||
|
||||
1. **Use 27B+ models minimum** for reliable tool calling
|
||||
- Qwen 3.5 27B (Q3_XXS or Q4_K_XL)
|
||||
- Gemma 4 26B (IQ4_XS)
|
||||
- Avoid 14B and smaller for complex tasks
|
||||
|
||||
2. **Set context window to 32K minimum** for local models
|
||||
- Default 4K is insufficient for OpenCode's verbose prompts
|
||||
|
||||
3. **Use LM Studio or llama.cpp over Ollama** for better tool calling
|
||||
- Community reports more consistent results
|
||||
- May relate to chat template handling
|
||||
|
||||
4. **Correct chat templates required**
|
||||
- Default Qwen3.5 template causes 500 errors
|
||||
- Use corrected template from community gist
|
||||
|
||||
### 6.2 Weak Recommendations (Inferred from Analysis)
|
||||
|
||||
1. **Prompt compression** would benefit local models:
|
||||
- Remove redundant examples from CoderPrompt
|
||||
- Shorten tool descriptions (especially bash)
|
||||
- Consider "instruction hierarchy" formatting
|
||||
|
||||
2. **Tool description tiering**:
|
||||
- "Essential" tools for small models
|
||||
- "Extended" tools for large models
|
||||
|
||||
3. **JSON resilience layer**:
|
||||
- Partial JSON repair
|
||||
- Tool name fuzzy matching
|
||||
- Parameter type coercion
|
||||
|
||||
### 6.3 Architecture Observations
|
||||
|
||||
**Good for Local Models:**
|
||||
- Stateless sub-agents prevent context overflow
|
||||
- Minimal TaskPrompt for search operations
|
||||
- Auto-discovery of local endpoints
|
||||
- Provider abstraction allows local/remote mixing
|
||||
|
||||
**Challenging for Local Models:**
|
||||
- No prompt compression/tiering
|
||||
- No tool subset selection
|
||||
- No JSON repair for malformed calls
|
||||
- No KV cache sharing between agents
|
||||
|
||||
---
|
||||
|
||||
## 7. BENCHMARK DATA
|
||||
|
||||
From Rost Glukhov's testing (March 2026):
|
||||
|
||||
| Model | IndexNow Task | Migration Error | Speed |
|
||||
|-------|--------------|-----------------|-------|
|
||||
| Qwen 3.5 27b Q3_XXS | ✅ Pass | 5.0% | 34 tok/s |
|
||||
| Gemma 4 26B IQ4_XS | ✅ Pass | 6.2% | ~30 tok/s |
|
||||
| Qwen 3 14b | ❌ Fail | — | — |
|
||||
| GPT-OSS 20b | ❌ Fail | — | stalls |
|
||||
|
||||
**Threshold appears to be ~24B parameters** for reliable OpenCode operation.
|
||||
|
||||
---
|
||||
|
||||
## 8. SOURCE REFERENCES
|
||||
|
||||
- GitHub Issue #4428: Local LLM connection issues (36 comments)
|
||||
- GitHub Issue #13982: GLM-5 JSON parsing failures
|
||||
- GitHub Issue #1212: Context window overflow
|
||||
- Reddit r/opencodeCLI: Local model recommendations
|
||||
- Reddit r/LocalLLaMA: KV cache invalidation discussion
|
||||
- Aayush Garg blog: Qwen3.5 + llama.cpp + OpenCode setup
|
||||
- Rost Glukhov benchmark: Local LLM comparison
|
||||
|
||||
---
|
||||
|
||||
*Analysis conducted by examining opencode repository source code and synthesizing community feedback from multiple sources. Strong conclusions are backed by multiple verified reports; weak conclusions are reasoned inferences requiring further validation.*
|
||||
Reference in New Issue
Block a user