Files

T

sleepy a794d9bddf Add REPO_FEEDBACK.md files for opencode, hermes, forgecode, and pi-mono harnesses

2026-04-09 17:14:27 +02:00

12 KiB

Raw Permalink Blame History

OpenCode Repository Analysis: Local Model Compatibility

Date: April 9, 2026
Analysis Focus: Prompts, Tools, Parsing, and Skills for Local/Smaller Models
Source: Code analysis of opencode-ai/opencode + Community feedback from GitHub, Reddit, Discord

Executive Summary

OpenCode is a Go-based coding agent with heavy optimization for frontier models (Claude, GPT-4o). The codebase shows strong architectural decisions for general use but has specific pain points for local models that manifest as JSON parsing errors, tool calling failures, and context truncation issues.

Verdict: Works well with local models 27B+ (Qwen3.5 27B, Gemma 4 26B) with configuration adjustments. Smaller models (7B-14B) struggle due to prompt complexity and tool count.

1. PROMPTS Analysis

1.1 Prompt Structure

OpenCode uses provider-specific prompts with significant differences between Anthropic and OpenAI formats:

File	Purpose	Lines	Strength for Local Models
`prompt/coder.go`	Main coding agent	~220	⚠️ VERBOSE - Complex instructions
`prompt/task.go`	Sub-agent (search)	~17	✅ GOOD - Minimal, focused
`prompt/summarizer.go`	Session summary	~16	✅ GOOD - Simple directive
`prompt/title.go`	Session titling	~13	✅ GOOD - Single task

1.2 The Coder Prompt (Critical Analysis)

Location: internal/llm/prompt/coder.go

The baseAnthropicCoderPrompt is excessively verbose (~170 lines of instructions). Key sections:

// Sections that add token overhead:
- Tone and style (lines 86-93): ~400 tokens of verbosity constraints
- Examples section (lines 94-135): Multiple <example> blocks
- Proactiveness guidelines (lines 137-142)
- Following conventions (lines 144-149)
- Code style rules (lines 151-152)
- Doing tasks workflow (lines 155-162)
- Tool usage policy (lines 163-166)

Strong Conclusion (Verified):

The prompt is designed for models with strong instruction-following (Claude 3.5+, GPT-4o)
Local models 14B and smaller struggle to retain all constraints
Community feedback confirms: "Qwen 3 14b fails" while "Qwen 3.5 27b works well"

Weak Conclusion (Inference):

The verbosity may cause "instruction dilution" where smaller models fixate on early/late instructions and miss middle constraints
The example-heavy format (6+ examples) may be over-optimizing for frontier models

1.3 What Works Well

✅ Provider-aware prompting - Different prompts for Anthropic vs OpenAI endpoints
✅ Environment injection - Dynamic context (working dir, git status, platform, date)
✅ Project-specific context - Auto-loading from OpenCode.md or configured paths
✅ LSP integration hints - Conditional diagnostics info only when LSP available

1.4 Problems for Local Models

❌ Excessive constraints - "You MUST..." appears 8+ times, creating conflicting priorities
❌ Nested conditionals - "If X then Y unless Z in which case..." structure
❌ Implicit dependencies - Assumes model can track multiple tool calls across turns

1.5 Community Evidence

"Local models are more for vibe coding. Not really set for agentic coding. Unless you can host minimax2.5 to actually be worthwhile." — Reddit r/opencodeCLI

"Qwen 3 14b - fails with hallucinations" vs "Qwen 3.5 27b Q3_XXS - 5.0% migration error, clear winner for local use" — Rost Glukhov benchmark

2. TOOLS Analysis

2.1 Tool Inventory

Coder Agent Tools: 11 core tools

Tool	Description Length	Params	Risk for Local Models
`bash`	~200 lines	2	⚠️ HIGH - Complex bash description with git/PR instructions
`edit`	~90 lines	3	⚠️ MEDIUM - Requires precise string matching
`write`	~60 lines	2	✅ LOW - Straightforward
`view`	~70 lines	3	✅ LOW - Well-documented
`glob`	~40 lines	1	✅ LOW - Simple
`grep`	~80 lines	4	⚠️ MEDIUM - Regex/literal_text nuance
`ls`	~30 lines	2	✅ LOW - Simple
`fetch`	~40 lines	1	✅ LOW - Simple
`patch`	~50 lines	2	⚠️ MEDIUM - Requires understanding diff format
`sourcegraph`	~30 lines	1	✅ LOW - Simple
`diagnostics`	~20 lines	1	✅ LOW - Simple
`agent`	~40 lines	1	⚠️ MEDIUM - Meta-cognitive (sub-agent)

2.2 Tool Description Problems

CRITICAL ISSUE: Bash Tool Description

Location: internal/llm/tools/bash.go lines 57-203

The bash tool description is excessively long (~3500 characters) and includes:

Directory verification steps
Security check procedures
Command execution flow
Output processing rules
Git commit workflow (lines 97-151)
PR creation workflow (lines 153-199)

Strong Conclusion (Verified):

Community reports "invalid tool call message with wrong tool name" errors
GitHub Issue #13982: GLM-5 "screwing up the JSON parsing" specifically on read tool

Weak Conclusion (Inference):

The tool descriptions may exceed effective context window for 8K-16K models when combined with prompts
Local models may "lose track" of which tool they're calling due to description overload

2.3 Tool Calling Issues (Community Verified)

GitHub Issue #4428 (36 comments): "Why is opencode not working with local llms via Ollama?"

"After many issues with Ollama (mostly that all models default to a very small context window and you have to modify them or find versions with bigger context window settings, and tool call formatting issues), after installing LM Studio I was able to consistently use qwen/qwen3-30b-a3b-2507 with tools"

GitHub Issue #13982: "[bug] GLM 5 keeps screwing up the json parsing of read tool calling"

"The AI keeps screwing up the JSON formatting for the tool calling. Sometimes I even get 'Method Not Allowed' errors that stops the build dead in the tracks."

3. PARSING Analysis

3.1 Tool Call Parsing Strategy

Location: internal/llm/provider/openai.go

OpenCode relies on native function calling via the OpenAI SDK:

func (o *openaiClient) toolCalls(completion openai.ChatCompletion) []message.ToolCall {
    // Extracts tool calls from API response
    // Assumes provider returns well-formed JSON
}

Strong Conclusion (Verified):

Uses standard OpenAI function calling format (works with llama.cpp, vLLM, Ollama)
No custom JSON parsing for tool arguments (relies on SDK/provider)

3.2 The Problem: Local Model Output

Weak Conclusion (Inference from patterns):

Local models often produce:

Malformed JSON - Trailing commas, unescaped quotes, missing braces
Partial tool calls - Starting JSON but not completing before max_tokens
Invalid tool names - Hallucinating tools that don't exist
Parameter type mismatches - Sending strings where numbers expected

The codebase has NO resilience for:

JSON repair/relaxation
Partial tool call streaming recovery
Tool name fuzzy matching
Parameter coercion

3.3 Context Window Truncation

Strong Conclusion (Verified):

GitHub Issue #1212: "Fetched documentation exceeds context window limit"

"When opencode pulls documentation from websites, the resulting response can sometimes exceed the context length of the current model in use (currently Claude Sonnet 4 for me). It's impossible to continue this session in this case."

Config gap: internal/llm/models/local.go sets:

ContextWindow: cmp.Or(model.LoadedContextLength, 4096),  // Falls back to 4K!

Community fix (Medium article): Must increase Ollama context from 4K to 32K for reasonable performance.

4. SKILLS / SUB-AGENTS Analysis

4.1 Agent Tool Architecture

Location: internal/llm/agent/agent-tool.go

const AgentToolName = "agent"

// Description emphasizes:
// - Parallel execution (good)
// - Stateless operation
// - Read-only (no bash/edit/write)

Strong Conclusion (Verified):

Sub-agents use TaskPrompt (minimal) vs CoderPrompt (verbose)
Task agent only gets: Glob, Grep, LS, Sourcegraph, View
This is actually good design - search tasks don't need editing tools

4.2 KV Cache Invalidation Issue

Strong Conclusion (Verified):

Reddit r/LocalLLaMA:

"I tried opencode, it also works fine with qwen models but the kv cache was invalidated when working with gpt 120B model."

"When a new sub agent is spun, the kv cache from parent is not reused so for the sub agent model processed the whole prompt again."

This is an architectural limitation - each agent spawns a new session/context.

4.3 Sub-Agent Loop Risk

Weak Conclusion (Inference):

The agent tool description says:

"The agent's outputs should generally be trusted"

For local models, this trust may be misplaced:

Sub-agent may return incomplete search results
No verification loop in parent agent
Can lead to cascading errors

5. LOCAL MODEL CONFIGURATION

5.1 Auto-Discovery (Good)

Location: internal/llm/models/local.go

// Automatically discovers models from:
// - v1/models endpoint (OpenAI compatible)
// - api/v0/models endpoint (LM Studio)
// Sets defaults for all agents

✅ Works well - No manual model registration needed for local endpoints

5.2 Context Window Defaults (Bad)

ContextWindow: cmp.Or(model.LoadedContextLength, 4096),

❌ 4K fallback is too small for the verbose prompts + tool descriptions

Community workaround:

// ~/.config/opencode/opencode.json
{
  "provider": {
    "ollama": {
      "models": {
        "qwen3:32b": {
          "contextLength": 32768  // Manually override
        }
      }
    }
  }
}

6. RECOMMENDATIONS

6.1 Strong Recommendations (Based on Verified Feedback)

Use 27B+ models minimum for reliable tool calling
- Qwen 3.5 27B (Q3_XXS or Q4_K_XL)
- Gemma 4 26B (IQ4_XS)
- Avoid 14B and smaller for complex tasks
Set context window to 32K minimum for local models
- Default 4K is insufficient for OpenCode's verbose prompts
Use LM Studio or llama.cpp over Ollama for better tool calling
- Community reports more consistent results
- May relate to chat template handling
Correct chat templates required
- Default Qwen3.5 template causes 500 errors
- Use corrected template from community gist

6.2 Weak Recommendations (Inferred from Analysis)

Prompt compression would benefit local models:
- Remove redundant examples from CoderPrompt
- Shorten tool descriptions (especially bash)
- Consider "instruction hierarchy" formatting
Tool description tiering:
- "Essential" tools for small models
- "Extended" tools for large models
JSON resilience layer:
- Partial JSON repair
- Tool name fuzzy matching
- Parameter type coercion

6.3 Architecture Observations

Good for Local Models:

Stateless sub-agents prevent context overflow
Minimal TaskPrompt for search operations
Auto-discovery of local endpoints
Provider abstraction allows local/remote mixing

Challenging for Local Models:

No prompt compression/tiering
No tool subset selection
No JSON repair for malformed calls
No KV cache sharing between agents

7. BENCHMARK DATA

From Rost Glukhov's testing (March 2026):

Model	IndexNow Task	Migration Error	Speed
Qwen 3.5 27b Q3_XXS	✅ Pass	5.0%	34 tok/s
Gemma 4 26B IQ4_XS	✅ Pass	6.2%	~30 tok/s
Qwen 3 14b	❌ Fail	—	—
GPT-OSS 20b	❌ Fail	—	stalls

Threshold appears to be ~24B parameters for reliable OpenCode operation.

8. SOURCE REFERENCES

GitHub Issue #4428: Local LLM connection issues (36 comments)
GitHub Issue #13982: GLM-5 JSON parsing failures
GitHub Issue #1212: Context window overflow
Reddit r/opencodeCLI: Local model recommendations
Reddit r/LocalLLaMA: KV cache invalidation discussion
Aayush Garg blog: Qwen3.5 + llama.cpp + OpenCode setup
Rost Glukhov benchmark: Local LLM comparison

Analysis conducted by examining opencode repository source code and synthesizing community feedback from multiple sources. Strong conclusions are backed by multiple verified reports; weak conclusions are reasoned inferences requiring further validation.

12 KiB Raw Permalink Blame History

OpenCode Repository Analysis: Local Model Compatibility

Executive Summary

1. PROMPTS Analysis

1.1 Prompt Structure

1.2 The Coder Prompt (Critical Analysis)

1.3 What Works Well

1.4 Problems for Local Models

1.5 Community Evidence

2. TOOLS Analysis

2.1 Tool Inventory

2.2 Tool Description Problems

2.3 Tool Calling Issues (Community Verified)

3. PARSING Analysis

3.1 Tool Call Parsing Strategy

3.2 The Problem: Local Model Output

3.3 Context Window Truncation

4. SKILLS / SUB-AGENTS Analysis

4.1 Agent Tool Architecture

4.2 KV Cache Invalidation Issue

4.3 Sub-Agent Loop Risk

5. LOCAL MODEL CONFIGURATION

5.1 Auto-Discovery (Good)

5.2 Context Window Defaults (Bad)

6. RECOMMENDATIONS

6.1 Strong Recommendations (Based on Verified Feedback)

6.2 Weak Recommendations (Inferred from Analysis)

6.3 Architecture Observations

7. BENCHMARK DATA

8. SOURCE REFERENCES

12 KiB

Raw Permalink Blame History