12 KiB
OpenCode Repository Analysis: Local Model Compatibility
Date: April 9, 2026
Analysis Focus: Prompts, Tools, Parsing, and Skills for Local/Smaller Models
Source: Code analysis of opencode-ai/opencode + Community feedback from GitHub, Reddit, Discord
Executive Summary
OpenCode is a Go-based coding agent with heavy optimization for frontier models (Claude, GPT-4o). The codebase shows strong architectural decisions for general use but has specific pain points for local models that manifest as JSON parsing errors, tool calling failures, and context truncation issues.
Verdict: Works well with local models 27B+ (Qwen3.5 27B, Gemma 4 26B) with configuration adjustments. Smaller models (7B-14B) struggle due to prompt complexity and tool count.
1. PROMPTS Analysis
1.1 Prompt Structure
OpenCode uses provider-specific prompts with significant differences between Anthropic and OpenAI formats:
| File | Purpose | Lines | Strength for Local Models |
|---|---|---|---|
prompt/coder.go |
Main coding agent | ~220 | ⚠️ VERBOSE - Complex instructions |
prompt/task.go |
Sub-agent (search) | ~17 | ✅ GOOD - Minimal, focused |
prompt/summarizer.go |
Session summary | ~16 | ✅ GOOD - Simple directive |
prompt/title.go |
Session titling | ~13 | ✅ GOOD - Single task |
1.2 The Coder Prompt (Critical Analysis)
Location: internal/llm/prompt/coder.go
The baseAnthropicCoderPrompt is excessively verbose (~170 lines of instructions). Key sections:
// Sections that add token overhead:
- Tone and style (lines 86-93): ~400 tokens of verbosity constraints
- Examples section (lines 94-135): Multiple <example> blocks
- Proactiveness guidelines (lines 137-142)
- Following conventions (lines 144-149)
- Code style rules (lines 151-152)
- Doing tasks workflow (lines 155-162)
- Tool usage policy (lines 163-166)
Strong Conclusion (Verified):
- The prompt is designed for models with strong instruction-following (Claude 3.5+, GPT-4o)
- Local models 14B and smaller struggle to retain all constraints
- Community feedback confirms: "Qwen 3 14b fails" while "Qwen 3.5 27b works well"
Weak Conclusion (Inference):
- The verbosity may cause "instruction dilution" where smaller models fixate on early/late instructions and miss middle constraints
- The example-heavy format (6+ examples) may be over-optimizing for frontier models
1.3 What Works Well
✅ Provider-aware prompting - Different prompts for Anthropic vs OpenAI endpoints
✅ Environment injection - Dynamic context (working dir, git status, platform, date)
✅ Project-specific context - Auto-loading from OpenCode.md or configured paths
✅ LSP integration hints - Conditional diagnostics info only when LSP available
1.4 Problems for Local Models
❌ Excessive constraints - "You MUST..." appears 8+ times, creating conflicting priorities
❌ Nested conditionals - "If X then Y unless Z in which case..." structure
❌ Implicit dependencies - Assumes model can track multiple tool calls across turns
1.5 Community Evidence
"Local models are more for vibe coding. Not really set for agentic coding. Unless you can host minimax2.5 to actually be worthwhile." — Reddit r/opencodeCLI
"Qwen 3 14b - fails with hallucinations" vs "Qwen 3.5 27b Q3_XXS - 5.0% migration error, clear winner for local use" — Rost Glukhov benchmark
2. TOOLS Analysis
2.1 Tool Inventory
Coder Agent Tools: 11 core tools
| Tool | Description Length | Params | Risk for Local Models |
|---|---|---|---|
bash |
~200 lines | 2 | ⚠️ HIGH - Complex bash description with git/PR instructions |
edit |
~90 lines | 3 | ⚠️ MEDIUM - Requires precise string matching |
write |
~60 lines | 2 | ✅ LOW - Straightforward |
view |
~70 lines | 3 | ✅ LOW - Well-documented |
glob |
~40 lines | 1 | ✅ LOW - Simple |
grep |
~80 lines | 4 | ⚠️ MEDIUM - Regex/literal_text nuance |
ls |
~30 lines | 2 | ✅ LOW - Simple |
fetch |
~40 lines | 1 | ✅ LOW - Simple |
patch |
~50 lines | 2 | ⚠️ MEDIUM - Requires understanding diff format |
sourcegraph |
~30 lines | 1 | ✅ LOW - Simple |
diagnostics |
~20 lines | 1 | ✅ LOW - Simple |
agent |
~40 lines | 1 | ⚠️ MEDIUM - Meta-cognitive (sub-agent) |
2.2 Tool Description Problems
CRITICAL ISSUE: Bash Tool Description
Location: internal/llm/tools/bash.go lines 57-203
The bash tool description is excessively long (~3500 characters) and includes:
- Directory verification steps
- Security check procedures
- Command execution flow
- Output processing rules
- Git commit workflow (lines 97-151)
- PR creation workflow (lines 153-199)
Strong Conclusion (Verified):
- Community reports "invalid tool call message with wrong tool name" errors
- GitHub Issue #13982: GLM-5 "screwing up the JSON parsing" specifically on read tool
Weak Conclusion (Inference):
- The tool descriptions may exceed effective context window for 8K-16K models when combined with prompts
- Local models may "lose track" of which tool they're calling due to description overload
2.3 Tool Calling Issues (Community Verified)
GitHub Issue #4428 (36 comments): "Why is opencode not working with local llms via Ollama?"
"After many issues with Ollama (mostly that all models default to a very small context window and you have to modify them or find versions with bigger context window settings, and tool call formatting issues), after installing LM Studio I was able to consistently use qwen/qwen3-30b-a3b-2507 with tools"
GitHub Issue #13982: "[bug] GLM 5 keeps screwing up the json parsing of read tool calling"
"The AI keeps screwing up the JSON formatting for the tool calling. Sometimes I even get 'Method Not Allowed' errors that stops the build dead in the tracks."
3. PARSING Analysis
3.1 Tool Call Parsing Strategy
Location: internal/llm/provider/openai.go
OpenCode relies on native function calling via the OpenAI SDK:
func (o *openaiClient) toolCalls(completion openai.ChatCompletion) []message.ToolCall {
// Extracts tool calls from API response
// Assumes provider returns well-formed JSON
}
Strong Conclusion (Verified):
- Uses standard OpenAI function calling format (works with llama.cpp, vLLM, Ollama)
- No custom JSON parsing for tool arguments (relies on SDK/provider)
3.2 The Problem: Local Model Output
Weak Conclusion (Inference from patterns):
Local models often produce:
- Malformed JSON - Trailing commas, unescaped quotes, missing braces
- Partial tool calls - Starting JSON but not completing before max_tokens
- Invalid tool names - Hallucinating tools that don't exist
- Parameter type mismatches - Sending strings where numbers expected
The codebase has NO resilience for:
- JSON repair/relaxation
- Partial tool call streaming recovery
- Tool name fuzzy matching
- Parameter coercion
3.3 Context Window Truncation
Strong Conclusion (Verified):
GitHub Issue #1212: "Fetched documentation exceeds context window limit"
"When opencode pulls documentation from websites, the resulting response can sometimes exceed the context length of the current model in use (currently Claude Sonnet 4 for me). It's impossible to continue this session in this case."
Config gap: internal/llm/models/local.go sets:
ContextWindow: cmp.Or(model.LoadedContextLength, 4096), // Falls back to 4K!
Community fix (Medium article): Must increase Ollama context from 4K to 32K for reasonable performance.
4. SKILLS / SUB-AGENTS Analysis
4.1 Agent Tool Architecture
Location: internal/llm/agent/agent-tool.go
const AgentToolName = "agent"
// Description emphasizes:
// - Parallel execution (good)
// - Stateless operation
// - Read-only (no bash/edit/write)
Strong Conclusion (Verified):
- Sub-agents use TaskPrompt (minimal) vs CoderPrompt (verbose)
- Task agent only gets: Glob, Grep, LS, Sourcegraph, View
- This is actually good design - search tasks don't need editing tools
4.2 KV Cache Invalidation Issue
Strong Conclusion (Verified):
Reddit r/LocalLLaMA:
"I tried opencode, it also works fine with qwen models but the kv cache was invalidated when working with gpt 120B model."
"When a new sub agent is spun, the kv cache from parent is not reused so for the sub agent model processed the whole prompt again."
This is an architectural limitation - each agent spawns a new session/context.
4.3 Sub-Agent Loop Risk
Weak Conclusion (Inference):
The agent tool description says:
"The agent's outputs should generally be trusted"
For local models, this trust may be misplaced:
- Sub-agent may return incomplete search results
- No verification loop in parent agent
- Can lead to cascading errors
5. LOCAL MODEL CONFIGURATION
5.1 Auto-Discovery (Good)
Location: internal/llm/models/local.go
// Automatically discovers models from:
// - v1/models endpoint (OpenAI compatible)
// - api/v0/models endpoint (LM Studio)
// Sets defaults for all agents
✅ Works well - No manual model registration needed for local endpoints
5.2 Context Window Defaults (Bad)
ContextWindow: cmp.Or(model.LoadedContextLength, 4096),
❌ 4K fallback is too small for the verbose prompts + tool descriptions
Community workaround:
// ~/.config/opencode/opencode.json
{
"provider": {
"ollama": {
"models": {
"qwen3:32b": {
"contextLength": 32768 // Manually override
}
}
}
}
}
6. RECOMMENDATIONS
6.1 Strong Recommendations (Based on Verified Feedback)
-
Use 27B+ models minimum for reliable tool calling
- Qwen 3.5 27B (Q3_XXS or Q4_K_XL)
- Gemma 4 26B (IQ4_XS)
- Avoid 14B and smaller for complex tasks
-
Set context window to 32K minimum for local models
- Default 4K is insufficient for OpenCode's verbose prompts
-
Use LM Studio or llama.cpp over Ollama for better tool calling
- Community reports more consistent results
- May relate to chat template handling
-
Correct chat templates required
- Default Qwen3.5 template causes 500 errors
- Use corrected template from community gist
6.2 Weak Recommendations (Inferred from Analysis)
-
Prompt compression would benefit local models:
- Remove redundant examples from CoderPrompt
- Shorten tool descriptions (especially bash)
- Consider "instruction hierarchy" formatting
-
Tool description tiering:
- "Essential" tools for small models
- "Extended" tools for large models
-
JSON resilience layer:
- Partial JSON repair
- Tool name fuzzy matching
- Parameter type coercion
6.3 Architecture Observations
Good for Local Models:
- Stateless sub-agents prevent context overflow
- Minimal TaskPrompt for search operations
- Auto-discovery of local endpoints
- Provider abstraction allows local/remote mixing
Challenging for Local Models:
- No prompt compression/tiering
- No tool subset selection
- No JSON repair for malformed calls
- No KV cache sharing between agents
7. BENCHMARK DATA
From Rost Glukhov's testing (March 2026):
| Model | IndexNow Task | Migration Error | Speed |
|---|---|---|---|
| Qwen 3.5 27b Q3_XXS | ✅ Pass | 5.0% | 34 tok/s |
| Gemma 4 26B IQ4_XS | ✅ Pass | 6.2% | ~30 tok/s |
| Qwen 3 14b | ❌ Fail | — | — |
| GPT-OSS 20b | ❌ Fail | — | stalls |
Threshold appears to be ~24B parameters for reliable OpenCode operation.
8. SOURCE REFERENCES
- GitHub Issue #4428: Local LLM connection issues (36 comments)
- GitHub Issue #13982: GLM-5 JSON parsing failures
- GitHub Issue #1212: Context window overflow
- Reddit r/opencodeCLI: Local model recommendations
- Reddit r/LocalLLaMA: KV cache invalidation discussion
- Aayush Garg blog: Qwen3.5 + llama.cpp + OpenCode setup
- Rost Glukhov benchmark: Local LLM comparison
Analysis conducted by examining opencode repository source code and synthesizing community feedback from multiple sources. Strong conclusions are backed by multiple verified reports; weak conclusions are reasoned inferences requiring further validation.