Harnesses under analysis: - opencode (Go-based coding agent) - pi (minimal terminal coding harness by Mario Zechner) - hermes (Nous Research agent) - forgecode (AI pair programmer with sub-agents) Each harness folder contains: - repo/: Source code from respective repositories - feedback/localllm/: Community feedback for local/smaller models - feedback/frontier/: Community feedback for frontier models Research focus: Tool handling, skills systems, prompt engineering, context management, and best practices for smaller/local models.
3.4 KiB
Local Model Setup Issues & Solutions
Source reference: GitHub issues, Reddit, official FAQ, blog posts
Issue #523: Local Model Setup Skill Request
Problem: Users struggle with local model configuration
"No model recommendations: Users must know which models support tool calling. There's no guidance on model selection. No setup instructions: No docs or skills for installing/configuring Ollama, llama.cpp, or vLLM."
Requested Solution: A skill that guides users through:
- Setting up local models with Hermes Agent
- Model recommendations for different use cases
- Configuration nuances that trip up new users
Issue #1071: llama-server Compatibility (CRITICAL)
Error: 'dict' object has no attribute 'strip'
Impact: Complete failure with llama-server/Ollama backends
Fix Location: run_agent.py line ~4280
User Workaround:
# Add before: if not args or not args.strip():
if isinstance(args, (dict, list)):
tc.function.arguments = json.dumps(args)
continue
Related Issues:
- llama.cpp #14697
- ollama-python #484
- litellm #8313
Context Length Configuration Issues
Common Error: "Context exceeded your setting"
Source: https://www.reddit.com/r/LocalLLM/comments/1sc82o8/hermesagent_what_is_this_message_about/
"Context exceeded your setting. Either your Hermes context or your llm server context setting for that particular model. By default context is usually set to something comically low."
Solution:
model:
default: your-model-name
context_length: 32768 # Match your server's num_ctx
Issue #879: Local Model Routing for Auxiliary Tasks
Feature Request: Direct auxiliary tasks (vision, etc.) to local endpoint independently of main provider
Use Case: Use local model for fast tasks, cloud model for complex reasoning
Dependencies: Multi-model hybrid setup support
Windows/WSL2 Limitations
Status: Native Windows not supported
"Native Windows support is extremely experimental and unsupported. Please install WSL2 and run Hermes Agent from there."
Installation:
# Inside WSL2
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
Best Practices from Community
Ollama Setup
- Start server with adequate context:
ollama run --num_ctx 16384 - Match context in Hermes config exactly
- Use
hermes modelto select "Custom endpoint" - Base URL:
http://localhost:11434/v1 - Leave API key blank for local
Recommended Local Models by Use Case
| Use Case | Model | VRAM Needed |
|---|---|---|
| General agent work | Qwen 3.5 27B | 24GB |
| Fast responses | Qwen 3.5 14B | 16GB |
| Limited VRAM | Qwen 3.5 8B | 8GB |
| Experimental | Gemma 4 27B | 24GB |
Common Pitfalls
- Mismatching context lengths between Ollama and Hermes
- Assuming all models support tool calling equally well
- Not setting max iterations appropriate for local model speed
- Expecting frontier-level reliability from smaller models
Community Feedback Summary
Positive:
- "Hermes agent already works way way better than Open Claw and it actually works pretty well locally"
- Better local model support than alternatives
Challenges:
- Tool calling reliability varies by model
- Configuration complexity for beginners
- Token overhead still applies (13.9K tokens per call)