# OpenCode Repository Analysis: Local Model Compatibility **Date:** April 9, 2026 **Analysis Focus:** Prompts, Tools, Parsing, and Skills for Local/Smaller Models **Source:** Code analysis of `opencode-ai/opencode` + Community feedback from GitHub, Reddit, Discord --- ## Executive Summary OpenCode is a Go-based coding agent with heavy optimization for frontier models (Claude, GPT-4o). The codebase shows **strong architectural decisions** for general use but has **specific pain points for local models** that manifest as JSON parsing errors, tool calling failures, and context truncation issues. **Verdict:** Works well with local models 27B+ (Qwen3.5 27B, Gemma 4 26B) with configuration adjustments. Smaller models (7B-14B) struggle due to prompt complexity and tool count. --- ## 1. PROMPTS Analysis ### 1.1 Prompt Structure OpenCode uses **provider-specific prompts** with significant differences between Anthropic and OpenAI formats: | File | Purpose | Lines | Strength for Local Models | |------|---------|-------|---------------------------| | `prompt/coder.go` | Main coding agent | ~220 | ⚠️ VERBOSE - Complex instructions | | `prompt/task.go` | Sub-agent (search) | ~17 | ✅ GOOD - Minimal, focused | | `prompt/summarizer.go` | Session summary | ~16 | ✅ GOOD - Simple directive | | `prompt/title.go` | Session titling | ~13 | ✅ GOOD - Single task | ### 1.2 The Coder Prompt (Critical Analysis) **Location:** `internal/llm/prompt/coder.go` The `baseAnthropicCoderPrompt` is **excessively verbose** (~170 lines of instructions). Key sections: ```go // Sections that add token overhead: - Tone and style (lines 86-93): ~400 tokens of verbosity constraints - Examples section (lines 94-135): Multiple blocks - Proactiveness guidelines (lines 137-142) - Following conventions (lines 144-149) - Code style rules (lines 151-152) - Doing tasks workflow (lines 155-162) - Tool usage policy (lines 163-166) ``` **Strong Conclusion (Verified):** - The prompt is designed for models with strong instruction-following (Claude 3.5+, GPT-4o) - Local models 14B and smaller struggle to retain all constraints - Community feedback confirms: "Qwen 3 14b fails" while "Qwen 3.5 27b works well" **Weak Conclusion (Inference):** - The verbosity may cause "instruction dilution" where smaller models fixate on early/late instructions and miss middle constraints - The example-heavy format (6+ examples) may be over-optimizing for frontier models ### 1.3 What Works Well ✅ **Provider-aware prompting** - Different prompts for Anthropic vs OpenAI endpoints ✅ **Environment injection** - Dynamic context (working dir, git status, platform, date) ✅ **Project-specific context** - Auto-loading from `OpenCode.md` or configured paths ✅ **LSP integration hints** - Conditional diagnostics info only when LSP available ### 1.4 Problems for Local Models ❌ **Excessive constraints** - "You MUST..." appears 8+ times, creating conflicting priorities ❌ **Nested conditionals** - "If X then Y unless Z in which case..." structure ❌ **Implicit dependencies** - Assumes model can track multiple tool calls across turns ### 1.5 Community Evidence > "Local models are more for vibe coding. Not really set for agentic coding. Unless you can host minimax2.5 to actually be worthwhile." — Reddit r/opencodeCLI > "Qwen 3 14b - fails with hallucinations" vs "Qwen 3.5 27b Q3_XXS - 5.0% migration error, clear winner for local use" — Rost Glukhov benchmark --- ## 2. TOOLS Analysis ### 2.1 Tool Inventory **Coder Agent Tools:** 11 core tools | Tool | Description Length | Params | Risk for Local Models | |------|-------------------|--------|----------------------| | `bash` | ~200 lines | 2 | ⚠️ HIGH - Complex bash description with git/PR instructions | | `edit` | ~90 lines | 3 | ⚠️ MEDIUM - Requires precise string matching | | `write` | ~60 lines | 2 | ✅ LOW - Straightforward | | `view` | ~70 lines | 3 | ✅ LOW - Well-documented | | `glob` | ~40 lines | 1 | ✅ LOW - Simple | | `grep` | ~80 lines | 4 | ⚠️ MEDIUM - Regex/literal_text nuance | | `ls` | ~30 lines | 2 | ✅ LOW - Simple | | `fetch` | ~40 lines | 1 | ✅ LOW - Simple | | `patch` | ~50 lines | 2 | ⚠️ MEDIUM - Requires understanding diff format | | `sourcegraph` | ~30 lines | 1 | ✅ LOW - Simple | | `diagnostics` | ~20 lines | 1 | ✅ LOW - Simple | | `agent` | ~40 lines | 1 | ⚠️ MEDIUM - Meta-cognitive (sub-agent) | ### 2.2 Tool Description Problems **CRITICAL ISSUE: Bash Tool Description** Location: `internal/llm/tools/bash.go` lines 57-203 The bash tool description is **excessively long** (~3500 characters) and includes: - Directory verification steps - Security check procedures - Command execution flow - Output processing rules - Git commit workflow (lines 97-151) - PR creation workflow (lines 153-199) **Strong Conclusion (Verified):** - Community reports "invalid tool call message with wrong tool name" errors - GitHub Issue #13982: GLM-5 "screwing up the JSON parsing" specifically on read tool **Weak Conclusion (Inference):** - The tool descriptions may exceed effective context window for 8K-16K models when combined with prompts - Local models may "lose track" of which tool they're calling due to description overload ### 2.3 Tool Calling Issues (Community Verified) GitHub Issue #4428 (36 comments): "Why is opencode not working with local llms via Ollama?" > "After many issues with Ollama (mostly that all models default to a very small context window and you have to modify them or find versions with bigger context window settings, and tool call formatting issues), after installing LM Studio I was able to consistently use qwen/qwen3-30b-a3b-2507 with tools" GitHub Issue #13982: "[bug] GLM 5 keeps screwing up the json parsing of read tool calling" > "The AI keeps screwing up the JSON formatting for the tool calling. Sometimes I even get 'Method Not Allowed' errors that stops the build dead in the tracks." --- ## 3. PARSING Analysis ### 3.1 Tool Call Parsing Strategy **Location:** `internal/llm/provider/openai.go` OpenCode relies on **native function calling** via the OpenAI SDK: ```go func (o *openaiClient) toolCalls(completion openai.ChatCompletion) []message.ToolCall { // Extracts tool calls from API response // Assumes provider returns well-formed JSON } ``` **Strong Conclusion (Verified):** - Uses standard OpenAI function calling format (works with llama.cpp, vLLM, Ollama) - No custom JSON parsing for tool arguments (relies on SDK/provider) ### 3.2 The Problem: Local Model Output **Weak Conclusion (Inference from patterns):** Local models often produce: 1. **Malformed JSON** - Trailing commas, unescaped quotes, missing braces 2. **Partial tool calls** - Starting JSON but not completing before max_tokens 3. **Invalid tool names** - Hallucinating tools that don't exist 4. **Parameter type mismatches** - Sending strings where numbers expected **The codebase has NO resilience for:** - JSON repair/relaxation - Partial tool call streaming recovery - Tool name fuzzy matching - Parameter coercion ### 3.3 Context Window Truncation **Strong Conclusion (Verified):** GitHub Issue #1212: "Fetched documentation exceeds context window limit" > "When opencode pulls documentation from websites, the resulting response can sometimes exceed the context length of the current model in use (currently Claude Sonnet 4 for me). It's impossible to continue this session in this case." **Config gap:** `internal/llm/models/local.go` sets: ```go ContextWindow: cmp.Or(model.LoadedContextLength, 4096), // Falls back to 4K! ``` Community fix (Medium article): Must increase Ollama context from 4K to 32K for reasonable performance. --- ## 4. SKILLS / SUB-AGENTS Analysis ### 4.1 Agent Tool Architecture **Location:** `internal/llm/agent/agent-tool.go` ```go const AgentToolName = "agent" // Description emphasizes: // - Parallel execution (good) // - Stateless operation // - Read-only (no bash/edit/write) ``` **Strong Conclusion (Verified):** - Sub-agents use **TaskPrompt** (minimal) vs **CoderPrompt** (verbose) - Task agent only gets: Glob, Grep, LS, Sourcegraph, View - This is actually **good design** - search tasks don't need editing tools ### 4.2 KV Cache Invalidation Issue **Strong Conclusion (Verified):** Reddit r/LocalLLaMA: > "I tried opencode, it also works fine with qwen models but the kv cache was invalidated when working with gpt 120B model." > "When a new sub agent is spun, the kv cache from parent is not reused so for the sub agent model processed the whole prompt again." This is an **architectural limitation** - each agent spawns a new session/context. ### 4.3 Sub-Agent Loop Risk **Weak Conclusion (Inference):** The agent tool description says: > "The agent's outputs should generally be trusted" For local models, this trust may be misplaced: - Sub-agent may return incomplete search results - No verification loop in parent agent - Can lead to cascading errors --- ## 5. LOCAL MODEL CONFIGURATION ### 5.1 Auto-Discovery (Good) **Location:** `internal/llm/models/local.go` ```go // Automatically discovers models from: // - v1/models endpoint (OpenAI compatible) // - api/v0/models endpoint (LM Studio) // Sets defaults for all agents ``` ✅ **Works well** - No manual model registration needed for local endpoints ### 5.2 Context Window Defaults (Bad) ```go ContextWindow: cmp.Or(model.LoadedContextLength, 4096), ``` ❌ **4K fallback is too small** for the verbose prompts + tool descriptions Community workaround: ```json // ~/.config/opencode/opencode.json { "provider": { "ollama": { "models": { "qwen3:32b": { "contextLength": 32768 // Manually override } } } } } ``` --- ## 6. RECOMMENDATIONS ### 6.1 Strong Recommendations (Based on Verified Feedback) 1. **Use 27B+ models minimum** for reliable tool calling - Qwen 3.5 27B (Q3_XXS or Q4_K_XL) - Gemma 4 26B (IQ4_XS) - Avoid 14B and smaller for complex tasks 2. **Set context window to 32K minimum** for local models - Default 4K is insufficient for OpenCode's verbose prompts 3. **Use LM Studio or llama.cpp over Ollama** for better tool calling - Community reports more consistent results - May relate to chat template handling 4. **Correct chat templates required** - Default Qwen3.5 template causes 500 errors - Use corrected template from community gist ### 6.2 Weak Recommendations (Inferred from Analysis) 1. **Prompt compression** would benefit local models: - Remove redundant examples from CoderPrompt - Shorten tool descriptions (especially bash) - Consider "instruction hierarchy" formatting 2. **Tool description tiering**: - "Essential" tools for small models - "Extended" tools for large models 3. **JSON resilience layer**: - Partial JSON repair - Tool name fuzzy matching - Parameter type coercion ### 6.3 Architecture Observations **Good for Local Models:** - Stateless sub-agents prevent context overflow - Minimal TaskPrompt for search operations - Auto-discovery of local endpoints - Provider abstraction allows local/remote mixing **Challenging for Local Models:** - No prompt compression/tiering - No tool subset selection - No JSON repair for malformed calls - No KV cache sharing between agents --- ## 7. BENCHMARK DATA From Rost Glukhov's testing (March 2026): | Model | IndexNow Task | Migration Error | Speed | |-------|--------------|-----------------|-------| | Qwen 3.5 27b Q3_XXS | ✅ Pass | 5.0% | 34 tok/s | | Gemma 4 26B IQ4_XS | ✅ Pass | 6.2% | ~30 tok/s | | Qwen 3 14b | ❌ Fail | — | — | | GPT-OSS 20b | ❌ Fail | — | stalls | **Threshold appears to be ~24B parameters** for reliable OpenCode operation. --- ## 8. SOURCE REFERENCES - GitHub Issue #4428: Local LLM connection issues (36 comments) - GitHub Issue #13982: GLM-5 JSON parsing failures - GitHub Issue #1212: Context window overflow - Reddit r/opencodeCLI: Local model recommendations - Reddit r/LocalLLaMA: KV cache invalidation discussion - Aayush Garg blog: Qwen3.5 + llama.cpp + OpenCode setup - Rost Glukhov benchmark: Local LLM comparison --- *Analysis conducted by examining opencode repository source code and synthesizing community feedback from multiple sources. Strong conclusions are backed by multiple verified reports; weak conclusions are reasoned inferences requiring further validation.*