mid_model_research/opencode/REPO_FEEDBACK.md

# OpenCode Repository Analysis: Local Model Compatibility

**Date:** April 9, 2026
**Analysis Focus:** Prompts, Tools, Parsing, and Skills for Local/Smaller Models
**Source:** Code analysis of `opencode-ai/opencode` + Community feedback from GitHub, Reddit, Discord

---

## Executive Summary

OpenCode is a Go-based coding agent with heavy optimization for frontier models (Claude, GPT-4o). The codebase shows **strong architectural decisions** for general use but has **specific pain points for local models** that manifest as JSON parsing errors, tool calling failures, and context truncation issues.

**Verdict:** Works well with local models 27B+ (Qwen3.5 27B, Gemma 4 26B) with configuration adjustments. Smaller models (7B-14B) struggle due to prompt complexity and tool count.

---

## 1. PROMPTS Analysis

### 1.1 Prompt Structure

OpenCode uses **provider-specific prompts** with significant differences between Anthropic and OpenAI formats:

| File | Purpose | Lines | Strength for Local Models |
|------|---------|-------|---------------------------|
| `prompt/coder.go` | Main coding agent | ~220 | ⚠️ VERBOSE - Complex instructions |
| `prompt/task.go` | Sub-agent (search) | ~17 | ✅ GOOD - Minimal, focused |
| `prompt/summarizer.go` | Session summary | ~16 | ✅ GOOD - Simple directive |
| `prompt/title.go` | Session titling | ~13 | ✅ GOOD - Single task |

### 1.2 The Coder Prompt (Critical Analysis)

**Location:** `internal/llm/prompt/coder.go`

The `baseAnthropicCoderPrompt` is **excessively verbose** (~170 lines of instructions). Key sections:

```go
// Sections that add token overhead:
- Tone and style (lines 86-93): ~400 tokens of verbosity constraints
- Examples section (lines 94-135): Multiple <example> blocks
- Proactiveness guidelines (lines 137-142)
- Following conventions (lines 144-149)
- Code style rules (lines 151-152)
- Doing tasks workflow (lines 155-162)
- Tool usage policy (lines 163-166)
```

**Strong Conclusion (Verified):**
- The prompt is designed for models with strong instruction-following (Claude 3.5+, GPT-4o)
- Local models 14B and smaller struggle to retain all constraints
- Community feedback confirms: "Qwen 3 14b fails" while "Qwen 3.5 27b works well"

**Weak Conclusion (Inference):**
- The verbosity may cause "instruction dilution" where smaller models fixate on early/late instructions and miss middle constraints
- The example-heavy format (6+ examples) may be over-optimizing for frontier models

### 1.3 What Works Well

✅ **Provider-aware prompting** - Different prompts for Anthropic vs OpenAI endpoints
✅ **Environment injection** - Dynamic context (working dir, git status, platform, date)
✅ **Project-specific context** - Auto-loading from `OpenCode.md` or configured paths
✅ **LSP integration hints** - Conditional diagnostics info only when LSP available

### 1.4 Problems for Local Models

❌ **Excessive constraints** - "You MUST..." appears 8+ times, creating conflicting priorities
❌ **Nested conditionals** - "If X then Y unless Z in which case..." structure
❌ **Implicit dependencies** - Assumes model can track multiple tool calls across turns

### 1.5 Community Evidence

> "Local models are more for vibe coding. Not really set for agentic coding. Unless you can host minimax2.5 to actually be worthwhile." — Reddit r/opencodeCLI

> "Qwen 3 14b - fails with hallucinations" vs "Qwen 3.5 27b Q3_XXS - 5.0% migration error, clear winner for local use" — Rost Glukhov benchmark

---

## 2. TOOLS Analysis

### 2.1 Tool Inventory

**Coder Agent Tools:** 11 core tools

| Tool | Description Length | Params | Risk for Local Models |
|------|-------------------|--------|----------------------|
| `bash` | ~200 lines | 2 | ⚠️ HIGH - Complex bash description with git/PR instructions |
| `edit` | ~90 lines | 3 | ⚠️ MEDIUM - Requires precise string matching |
| `write` | ~60 lines | 2 | ✅ LOW - Straightforward |
| `view` | ~70 lines | 3 | ✅ LOW - Well-documented |
| `glob` | ~40 lines | 1 | ✅ LOW - Simple |
| `grep` | ~80 lines | 4 | ⚠️ MEDIUM - Regex/literal_text nuance |
| `ls` | ~30 lines | 2 | ✅ LOW - Simple |
| `fetch` | ~40 lines | 1 | ✅ LOW - Simple |
| `patch` | ~50 lines | 2 | ⚠️ MEDIUM - Requires understanding diff format |
| `sourcegraph` | ~30 lines | 1 | ✅ LOW - Simple |
| `diagnostics` | ~20 lines | 1 | ✅ LOW - Simple |
| `agent` | ~40 lines | 1 | ⚠️ MEDIUM - Meta-cognitive (sub-agent) |

### 2.2 Tool Description Problems

**CRITICAL ISSUE: Bash Tool Description**

Location: `internal/llm/tools/bash.go` lines 57-203

The bash tool description is **excessively long** (~3500 characters) and includes:
- Directory verification steps
- Security check procedures
- Command execution flow
- Output processing rules
- Git commit workflow (lines 97-151)
- PR creation workflow (lines 153-199)

**Strong Conclusion (Verified):**
- Community reports "invalid tool call message with wrong tool name" errors
- GitHub Issue #13982: GLM-5 "screwing up the JSON parsing" specifically on read tool

**Weak Conclusion (Inference):**
- The tool descriptions may exceed effective context window for 8K-16K models when combined with prompts
- Local models may "lose track" of which tool they're calling due to description overload

### 2.3 Tool Calling Issues (Community Verified)

GitHub Issue #4428 (36 comments): "Why is opencode not working with local llms via Ollama?"

> "After many issues with Ollama (mostly that all models default to a very small context window and you have to modify them or find versions with bigger context window settings, and tool call formatting issues), after installing LM Studio I was able to consistently use qwen/qwen3-30b-a3b-2507 with tools"

GitHub Issue #13982: "[bug] GLM 5 keeps screwing up the json parsing of read tool calling"

> "The AI keeps screwing up the JSON formatting for the tool calling. Sometimes I even get 'Method Not Allowed' errors that stops the build dead in the tracks."

---

## 3. PARSING Analysis

### 3.1 Tool Call Parsing Strategy

**Location:** `internal/llm/provider/openai.go`

OpenCode relies on **native function calling** via the OpenAI SDK:

```go
func (o *openaiClient) toolCalls(completion openai.ChatCompletion) []message.ToolCall {
    // Extracts tool calls from API response
    // Assumes provider returns well-formed JSON
}
```

**Strong Conclusion (Verified):**
- Uses standard OpenAI function calling format (works with llama.cpp, vLLM, Ollama)
- No custom JSON parsing for tool arguments (relies on SDK/provider)

### 3.2 The Problem: Local Model Output

**Weak Conclusion (Inference from patterns):**

Local models often produce:
1. **Malformed JSON** - Trailing commas, unescaped quotes, missing braces
2. **Partial tool calls** - Starting JSON but not completing before max_tokens
3. **Invalid tool names** - Hallucinating tools that don't exist
4. **Parameter type mismatches** - Sending strings where numbers expected

**The codebase has NO resilience for:**
- JSON repair/relaxation
- Partial tool call streaming recovery
- Tool name fuzzy matching
- Parameter coercion

### 3.3 Context Window Truncation

**Strong Conclusion (Verified):**

GitHub Issue #1212: "Fetched documentation exceeds context window limit"

> "When opencode pulls documentation from websites, the resulting response can sometimes exceed the context length of the current model in use (currently Claude Sonnet 4 for me). It's impossible to continue this session in this case."

**Config gap:** `internal/llm/models/local.go` sets:
```go
ContextWindow: cmp.Or(model.LoadedContextLength, 4096),  // Falls back to 4K!
```

Community fix (Medium article): Must increase Ollama context from 4K to 32K for reasonable performance.

---

## 4. SKILLS / SUB-AGENTS Analysis

### 4.1 Agent Tool Architecture

**Location:** `internal/llm/agent/agent-tool.go`

```go
const AgentToolName = "agent"

// Description emphasizes:
// - Parallel execution (good)
// - Stateless operation
// - Read-only (no bash/edit/write)
```

**Strong Conclusion (Verified):**
- Sub-agents use **TaskPrompt** (minimal) vs **CoderPrompt** (verbose)
- Task agent only gets: Glob, Grep, LS, Sourcegraph, View
- This is actually **good design** - search tasks don't need editing tools

### 4.2 KV Cache Invalidation Issue

**Strong Conclusion (Verified):**

Reddit r/LocalLLaMA:

> "I tried opencode, it also works fine with qwen models but the kv cache was invalidated when working with gpt 120B model."

> "When a new sub agent is spun, the kv cache from parent is not reused so for the sub agent model processed the whole prompt again."

This is an **architectural limitation** - each agent spawns a new session/context.

### 4.3 Sub-Agent Loop Risk

**Weak Conclusion (Inference):**

The agent tool description says:
> "The agent's outputs should generally be trusted"

For local models, this trust may be misplaced:
- Sub-agent may return incomplete search results
- No verification loop in parent agent
- Can lead to cascading errors

---

## 5. LOCAL MODEL CONFIGURATION

### 5.1 Auto-Discovery (Good)

**Location:** `internal/llm/models/local.go`

```go
// Automatically discovers models from:
// - v1/models endpoint (OpenAI compatible)
// - api/v0/models endpoint (LM Studio)
// Sets defaults for all agents
```

✅ **Works well** - No manual model registration needed for local endpoints

### 5.2 Context Window Defaults (Bad)

```go
ContextWindow: cmp.Or(model.LoadedContextLength, 4096),
```

❌ **4K fallback is too small** for the verbose prompts + tool descriptions

Community workaround:
```json
// ~/.config/opencode/opencode.json
{
  "provider": {
    "ollama": {
      "models": {
        "qwen3:32b": {
          "contextLength": 32768  // Manually override
        }
      }
    }
  }
}
```

---

## 6. RECOMMENDATIONS

### 6.1 Strong Recommendations (Based on Verified Feedback)

1. **Use 27B+ models minimum** for reliable tool calling
   - Qwen 3.5 27B (Q3_XXS or Q4_K_XL)
   - Gemma 4 26B (IQ4_XS)
   - Avoid 14B and smaller for complex tasks

2. **Set context window to 32K minimum** for local models
   - Default 4K is insufficient for OpenCode's verbose prompts

3. **Use LM Studio or llama.cpp over Ollama** for better tool calling
   - Community reports more consistent results
   - May relate to chat template handling

4. **Correct chat templates required**
   - Default Qwen3.5 template causes 500 errors
   - Use corrected template from community gist

### 6.2 Weak Recommendations (Inferred from Analysis)

1. **Prompt compression** would benefit local models:
   - Remove redundant examples from CoderPrompt
   - Shorten tool descriptions (especially bash)
   - Consider "instruction hierarchy" formatting

2. **Tool description tiering**:
   - "Essential" tools for small models
   - "Extended" tools for large models

3. **JSON resilience layer**:
   - Partial JSON repair
   - Tool name fuzzy matching
   - Parameter type coercion

### 6.3 Architecture Observations

**Good for Local Models:**
- Stateless sub-agents prevent context overflow
- Minimal TaskPrompt for search operations
- Auto-discovery of local endpoints
- Provider abstraction allows local/remote mixing

**Challenging for Local Models:**
- No prompt compression/tiering
- No tool subset selection
- No JSON repair for malformed calls
- No KV cache sharing between agents

---

## 7. BENCHMARK DATA

From Rost Glukhov's testing (March 2026):

| Model | IndexNow Task | Migration Error | Speed |
|-------|--------------|-----------------|-------|
| Qwen 3.5 27b Q3_XXS | ✅ Pass | 5.0% | 34 tok/s |
| Gemma 4 26B IQ4_XS | ✅ Pass | 6.2% | ~30 tok/s |
| Qwen 3 14b | ❌ Fail | — | — |
| GPT-OSS 20b | ❌ Fail | — | stalls |

**Threshold appears to be ~24B parameters** for reliable OpenCode operation.

---

## 8. SOURCE REFERENCES

- GitHub Issue #4428: Local LLM connection issues (36 comments)
- GitHub Issue #13982: GLM-5 JSON parsing failures
- GitHub Issue #1212: Context window overflow
- Reddit r/opencodeCLI: Local model recommendations
- Reddit r/LocalLLaMA: KV cache invalidation discussion
- Aayush Garg blog: Qwen3.5 + llama.cpp + OpenCode setup
- Rost Glukhov benchmark: Local LLM comparison

---

*Analysis conducted by examining opencode repository source code and synthesizing community feedback from multiple sources. Strong conclusions are backed by multiple verified reports; weak conclusions are reasoned inferences requiring further validation.*