Files
mid_model_research/opencode/REPO_FEEDBACK.md
T

351 lines
12 KiB
Markdown

# OpenCode Repository Analysis: Local Model Compatibility
**Date:** April 9, 2026
**Analysis Focus:** Prompts, Tools, Parsing, and Skills for Local/Smaller Models
**Source:** Code analysis of `opencode-ai/opencode` + Community feedback from GitHub, Reddit, Discord
---
## Executive Summary
OpenCode is a Go-based coding agent with heavy optimization for frontier models (Claude, GPT-4o). The codebase shows **strong architectural decisions** for general use but has **specific pain points for local models** that manifest as JSON parsing errors, tool calling failures, and context truncation issues.
**Verdict:** Works well with local models 27B+ (Qwen3.5 27B, Gemma 4 26B) with configuration adjustments. Smaller models (7B-14B) struggle due to prompt complexity and tool count.
---
## 1. PROMPTS Analysis
### 1.1 Prompt Structure
OpenCode uses **provider-specific prompts** with significant differences between Anthropic and OpenAI formats:
| File | Purpose | Lines | Strength for Local Models |
|------|---------|-------|---------------------------|
| `prompt/coder.go` | Main coding agent | ~220 | ⚠️ VERBOSE - Complex instructions |
| `prompt/task.go` | Sub-agent (search) | ~17 | ✅ GOOD - Minimal, focused |
| `prompt/summarizer.go` | Session summary | ~16 | ✅ GOOD - Simple directive |
| `prompt/title.go` | Session titling | ~13 | ✅ GOOD - Single task |
### 1.2 The Coder Prompt (Critical Analysis)
**Location:** `internal/llm/prompt/coder.go`
The `baseAnthropicCoderPrompt` is **excessively verbose** (~170 lines of instructions). Key sections:
```go
// Sections that add token overhead:
- Tone and style (lines 86-93): ~400 tokens of verbosity constraints
- Examples section (lines 94-135): Multiple <example> blocks
- Proactiveness guidelines (lines 137-142)
- Following conventions (lines 144-149)
- Code style rules (lines 151-152)
- Doing tasks workflow (lines 155-162)
- Tool usage policy (lines 163-166)
```
**Strong Conclusion (Verified):**
- The prompt is designed for models with strong instruction-following (Claude 3.5+, GPT-4o)
- Local models 14B and smaller struggle to retain all constraints
- Community feedback confirms: "Qwen 3 14b fails" while "Qwen 3.5 27b works well"
**Weak Conclusion (Inference):**
- The verbosity may cause "instruction dilution" where smaller models fixate on early/late instructions and miss middle constraints
- The example-heavy format (6+ examples) may be over-optimizing for frontier models
### 1.3 What Works Well
**Provider-aware prompting** - Different prompts for Anthropic vs OpenAI endpoints
**Environment injection** - Dynamic context (working dir, git status, platform, date)
**Project-specific context** - Auto-loading from `OpenCode.md` or configured paths
**LSP integration hints** - Conditional diagnostics info only when LSP available
### 1.4 Problems for Local Models
**Excessive constraints** - "You MUST..." appears 8+ times, creating conflicting priorities
**Nested conditionals** - "If X then Y unless Z in which case..." structure
**Implicit dependencies** - Assumes model can track multiple tool calls across turns
### 1.5 Community Evidence
> "Local models are more for vibe coding. Not really set for agentic coding. Unless you can host minimax2.5 to actually be worthwhile." — Reddit r/opencodeCLI
> "Qwen 3 14b - fails with hallucinations" vs "Qwen 3.5 27b Q3_XXS - 5.0% migration error, clear winner for local use" — Rost Glukhov benchmark
---
## 2. TOOLS Analysis
### 2.1 Tool Inventory
**Coder Agent Tools:** 11 core tools
| Tool | Description Length | Params | Risk for Local Models |
|------|-------------------|--------|----------------------|
| `bash` | ~200 lines | 2 | ⚠️ HIGH - Complex bash description with git/PR instructions |
| `edit` | ~90 lines | 3 | ⚠️ MEDIUM - Requires precise string matching |
| `write` | ~60 lines | 2 | ✅ LOW - Straightforward |
| `view` | ~70 lines | 3 | ✅ LOW - Well-documented |
| `glob` | ~40 lines | 1 | ✅ LOW - Simple |
| `grep` | ~80 lines | 4 | ⚠️ MEDIUM - Regex/literal_text nuance |
| `ls` | ~30 lines | 2 | ✅ LOW - Simple |
| `fetch` | ~40 lines | 1 | ✅ LOW - Simple |
| `patch` | ~50 lines | 2 | ⚠️ MEDIUM - Requires understanding diff format |
| `sourcegraph` | ~30 lines | 1 | ✅ LOW - Simple |
| `diagnostics` | ~20 lines | 1 | ✅ LOW - Simple |
| `agent` | ~40 lines | 1 | ⚠️ MEDIUM - Meta-cognitive (sub-agent) |
### 2.2 Tool Description Problems
**CRITICAL ISSUE: Bash Tool Description**
Location: `internal/llm/tools/bash.go` lines 57-203
The bash tool description is **excessively long** (~3500 characters) and includes:
- Directory verification steps
- Security check procedures
- Command execution flow
- Output processing rules
- Git commit workflow (lines 97-151)
- PR creation workflow (lines 153-199)
**Strong Conclusion (Verified):**
- Community reports "invalid tool call message with wrong tool name" errors
- GitHub Issue #13982: GLM-5 "screwing up the JSON parsing" specifically on read tool
**Weak Conclusion (Inference):**
- The tool descriptions may exceed effective context window for 8K-16K models when combined with prompts
- Local models may "lose track" of which tool they're calling due to description overload
### 2.3 Tool Calling Issues (Community Verified)
GitHub Issue #4428 (36 comments): "Why is opencode not working with local llms via Ollama?"
> "After many issues with Ollama (mostly that all models default to a very small context window and you have to modify them or find versions with bigger context window settings, and tool call formatting issues), after installing LM Studio I was able to consistently use qwen/qwen3-30b-a3b-2507 with tools"
GitHub Issue #13982: "[bug] GLM 5 keeps screwing up the json parsing of read tool calling"
> "The AI keeps screwing up the JSON formatting for the tool calling. Sometimes I even get 'Method Not Allowed' errors that stops the build dead in the tracks."
---
## 3. PARSING Analysis
### 3.1 Tool Call Parsing Strategy
**Location:** `internal/llm/provider/openai.go`
OpenCode relies on **native function calling** via the OpenAI SDK:
```go
func (o *openaiClient) toolCalls(completion openai.ChatCompletion) []message.ToolCall {
// Extracts tool calls from API response
// Assumes provider returns well-formed JSON
}
```
**Strong Conclusion (Verified):**
- Uses standard OpenAI function calling format (works with llama.cpp, vLLM, Ollama)
- No custom JSON parsing for tool arguments (relies on SDK/provider)
### 3.2 The Problem: Local Model Output
**Weak Conclusion (Inference from patterns):**
Local models often produce:
1. **Malformed JSON** - Trailing commas, unescaped quotes, missing braces
2. **Partial tool calls** - Starting JSON but not completing before max_tokens
3. **Invalid tool names** - Hallucinating tools that don't exist
4. **Parameter type mismatches** - Sending strings where numbers expected
**The codebase has NO resilience for:**
- JSON repair/relaxation
- Partial tool call streaming recovery
- Tool name fuzzy matching
- Parameter coercion
### 3.3 Context Window Truncation
**Strong Conclusion (Verified):**
GitHub Issue #1212: "Fetched documentation exceeds context window limit"
> "When opencode pulls documentation from websites, the resulting response can sometimes exceed the context length of the current model in use (currently Claude Sonnet 4 for me). It's impossible to continue this session in this case."
**Config gap:** `internal/llm/models/local.go` sets:
```go
ContextWindow: cmp.Or(model.LoadedContextLength, 4096), // Falls back to 4K!
```
Community fix (Medium article): Must increase Ollama context from 4K to 32K for reasonable performance.
---
## 4. SKILLS / SUB-AGENTS Analysis
### 4.1 Agent Tool Architecture
**Location:** `internal/llm/agent/agent-tool.go`
```go
const AgentToolName = "agent"
// Description emphasizes:
// - Parallel execution (good)
// - Stateless operation
// - Read-only (no bash/edit/write)
```
**Strong Conclusion (Verified):**
- Sub-agents use **TaskPrompt** (minimal) vs **CoderPrompt** (verbose)
- Task agent only gets: Glob, Grep, LS, Sourcegraph, View
- This is actually **good design** - search tasks don't need editing tools
### 4.2 KV Cache Invalidation Issue
**Strong Conclusion (Verified):**
Reddit r/LocalLLaMA:
> "I tried opencode, it also works fine with qwen models but the kv cache was invalidated when working with gpt 120B model."
> "When a new sub agent is spun, the kv cache from parent is not reused so for the sub agent model processed the whole prompt again."
This is an **architectural limitation** - each agent spawns a new session/context.
### 4.3 Sub-Agent Loop Risk
**Weak Conclusion (Inference):**
The agent tool description says:
> "The agent's outputs should generally be trusted"
For local models, this trust may be misplaced:
- Sub-agent may return incomplete search results
- No verification loop in parent agent
- Can lead to cascading errors
---
## 5. LOCAL MODEL CONFIGURATION
### 5.1 Auto-Discovery (Good)
**Location:** `internal/llm/models/local.go`
```go
// Automatically discovers models from:
// - v1/models endpoint (OpenAI compatible)
// - api/v0/models endpoint (LM Studio)
// Sets defaults for all agents
```
**Works well** - No manual model registration needed for local endpoints
### 5.2 Context Window Defaults (Bad)
```go
ContextWindow: cmp.Or(model.LoadedContextLength, 4096),
```
**4K fallback is too small** for the verbose prompts + tool descriptions
Community workaround:
```json
// ~/.config/opencode/opencode.json
{
"provider": {
"ollama": {
"models": {
"qwen3:32b": {
"contextLength": 32768 // Manually override
}
}
}
}
}
```
---
## 6. RECOMMENDATIONS
### 6.1 Strong Recommendations (Based on Verified Feedback)
1. **Use 27B+ models minimum** for reliable tool calling
- Qwen 3.5 27B (Q3_XXS or Q4_K_XL)
- Gemma 4 26B (IQ4_XS)
- Avoid 14B and smaller for complex tasks
2. **Set context window to 32K minimum** for local models
- Default 4K is insufficient for OpenCode's verbose prompts
3. **Use LM Studio or llama.cpp over Ollama** for better tool calling
- Community reports more consistent results
- May relate to chat template handling
4. **Correct chat templates required**
- Default Qwen3.5 template causes 500 errors
- Use corrected template from community gist
### 6.2 Weak Recommendations (Inferred from Analysis)
1. **Prompt compression** would benefit local models:
- Remove redundant examples from CoderPrompt
- Shorten tool descriptions (especially bash)
- Consider "instruction hierarchy" formatting
2. **Tool description tiering**:
- "Essential" tools for small models
- "Extended" tools for large models
3. **JSON resilience layer**:
- Partial JSON repair
- Tool name fuzzy matching
- Parameter type coercion
### 6.3 Architecture Observations
**Good for Local Models:**
- Stateless sub-agents prevent context overflow
- Minimal TaskPrompt for search operations
- Auto-discovery of local endpoints
- Provider abstraction allows local/remote mixing
**Challenging for Local Models:**
- No prompt compression/tiering
- No tool subset selection
- No JSON repair for malformed calls
- No KV cache sharing between agents
---
## 7. BENCHMARK DATA
From Rost Glukhov's testing (March 2026):
| Model | IndexNow Task | Migration Error | Speed |
|-------|--------------|-----------------|-------|
| Qwen 3.5 27b Q3_XXS | ✅ Pass | 5.0% | 34 tok/s |
| Gemma 4 26B IQ4_XS | ✅ Pass | 6.2% | ~30 tok/s |
| Qwen 3 14b | ❌ Fail | — | — |
| GPT-OSS 20b | ❌ Fail | — | stalls |
**Threshold appears to be ~24B parameters** for reliable OpenCode operation.
---
## 8. SOURCE REFERENCES
- GitHub Issue #4428: Local LLM connection issues (36 comments)
- GitHub Issue #13982: GLM-5 JSON parsing failures
- GitHub Issue #1212: Context window overflow
- Reddit r/opencodeCLI: Local model recommendations
- Reddit r/LocalLLaMA: KV cache invalidation discussion
- Aayush Garg blog: Qwen3.5 + llama.cpp + OpenCode setup
- Rost Glukhov benchmark: Local LLM comparison
---
*Analysis conducted by examining opencode repository source code and synthesizing community feedback from multiple sources. Strong conclusions are backed by multiple verified reports; weak conclusions are reasoned inferences requiring further validation.*