Files
mid_model_research/hermes/feedback/localllm/local-setup-issues.md
T
sleepy 51123212c4 Initial commit: coding harness feedback analysis
Harnesses under analysis:
- opencode (Go-based coding agent)
- pi (minimal terminal coding harness by Mario Zechner)
- hermes (Nous Research agent)
- forgecode (AI pair programmer with sub-agents)

Each harness folder contains:
- repo/: Source code from respective repositories
- feedback/localllm/: Community feedback for local/smaller models
- feedback/frontier/: Community feedback for frontier models

Research focus: Tool handling, skills systems, prompt engineering,
context management, and best practices for smaller/local models.
2026-04-09 15:13:45 +02:00

3.4 KiB

Local Model Setup Issues & Solutions

Source reference: GitHub issues, Reddit, official FAQ, blog posts


Issue #523: Local Model Setup Skill Request

Problem: Users struggle with local model configuration

"No model recommendations: Users must know which models support tool calling. There's no guidance on model selection. No setup instructions: No docs or skills for installing/configuring Ollama, llama.cpp, or vLLM."

Requested Solution: A skill that guides users through:

  1. Setting up local models with Hermes Agent
  2. Model recommendations for different use cases
  3. Configuration nuances that trip up new users

Issue #1071: llama-server Compatibility (CRITICAL)

Error: 'dict' object has no attribute 'strip'

Impact: Complete failure with llama-server/Ollama backends

Fix Location: run_agent.py line ~4280

User Workaround:

# Add before: if not args or not args.strip():
if isinstance(args, (dict, list)):
    tc.function.arguments = json.dumps(args)
    continue

Related Issues:

  • llama.cpp #14697
  • ollama-python #484
  • litellm #8313

Context Length Configuration Issues

Common Error: "Context exceeded your setting"

Source: https://www.reddit.com/r/LocalLLM/comments/1sc82o8/hermesagent_what_is_this_message_about/

"Context exceeded your setting. Either your Hermes context or your llm server context setting for that particular model. By default context is usually set to something comically low."

Solution:

model:
  default: your-model-name
  context_length: 32768  # Match your server's num_ctx

Issue #879: Local Model Routing for Auxiliary Tasks

Feature Request: Direct auxiliary tasks (vision, etc.) to local endpoint independently of main provider

Use Case: Use local model for fast tasks, cloud model for complex reasoning

Dependencies: Multi-model hybrid setup support


Windows/WSL2 Limitations

Status: Native Windows not supported

"Native Windows support is extremely experimental and unsupported. Please install WSL2 and run Hermes Agent from there."

Installation:

# Inside WSL2
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

Best Practices from Community

Ollama Setup

  1. Start server with adequate context: ollama run --num_ctx 16384
  2. Match context in Hermes config exactly
  3. Use hermes model to select "Custom endpoint"
  4. Base URL: http://localhost:11434/v1
  5. Leave API key blank for local
Use Case Model VRAM Needed
General agent work Qwen 3.5 27B 24GB
Fast responses Qwen 3.5 14B 16GB
Limited VRAM Qwen 3.5 8B 8GB
Experimental Gemma 4 27B 24GB

Common Pitfalls

  1. Mismatching context lengths between Ollama and Hermes
  2. Assuming all models support tool calling equally well
  3. Not setting max iterations appropriate for local model speed
  4. Expecting frontier-level reliability from smaller models

Community Feedback Summary

Positive:

  • "Hermes agent already works way way better than Open Claw and it actually works pretty well locally"
  • Better local model support than alternatives

Challenges:

  • Tool calling reliability varies by model
  • Configuration complexity for beginners
  • Token overhead still applies (13.9K tokens per call)