Files

T

sleepy 51123212c4 Initial commit: coding harness feedback analysis

Harnesses under analysis:
- opencode (Go-based coding agent)
- pi (minimal terminal coding harness by Mario Zechner)
- hermes (Nous Research agent)
- forgecode (AI pair programmer with sub-agents)

Each harness folder contains:
- repo/: Source code from respective repositories
- feedback/localllm/: Community feedback for local/smaller models
- feedback/frontier/: Community feedback for frontier models

Research focus: Tool handling, skills systems, prompt engineering,
context management, and best practices for smaller/local models.

2026-04-09 15:13:45 +02:00

3.4 KiB

Raw Blame History

Local Model Setup Issues & Solutions

Source reference: GitHub issues, Reddit, official FAQ, blog posts

Issue #523: Local Model Setup Skill Request

Problem: Users struggle with local model configuration

"No model recommendations: Users must know which models support tool calling. There's no guidance on model selection. No setup instructions: No docs or skills for installing/configuring Ollama, llama.cpp, or vLLM."

Requested Solution: A skill that guides users through:

Setting up local models with Hermes Agent
Model recommendations for different use cases
Configuration nuances that trip up new users

Issue #1071: llama-server Compatibility (CRITICAL)

Error: 'dict' object has no attribute 'strip'

Impact: Complete failure with llama-server/Ollama backends

Fix Location: run_agent.py line ~4280

User Workaround:

# Add before: if not args or not args.strip():
if isinstance(args, (dict, list)):
    tc.function.arguments = json.dumps(args)
    continue

Related Issues:

llama.cpp #14697
ollama-python #484
litellm #8313

Context Length Configuration Issues

Common Error: "Context exceeded your setting"

Source: https://www.reddit.com/r/LocalLLM/comments/1sc82o8/hermesagent_what_is_this_message_about/

"Context exceeded your setting. Either your Hermes context or your llm server context setting for that particular model. By default context is usually set to something comically low."

Solution:

model:
  default: your-model-name
  context_length: 32768  # Match your server's num_ctx

Issue #879: Local Model Routing for Auxiliary Tasks

Feature Request: Direct auxiliary tasks (vision, etc.) to local endpoint independently of main provider

Use Case: Use local model for fast tasks, cloud model for complex reasoning

Dependencies: Multi-model hybrid setup support

Windows/WSL2 Limitations

Status: Native Windows not supported

"Native Windows support is extremely experimental and unsupported. Please install WSL2 and run Hermes Agent from there."

Installation:

# Inside WSL2
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

Best Practices from Community

Ollama Setup

Start server with adequate context: ollama run --num_ctx 16384
Match context in Hermes config exactly
Use hermes model to select "Custom endpoint"
Base URL: http://localhost:11434/v1
Leave API key blank for local

Recommended Local Models by Use Case

Use Case	Model	VRAM Needed
General agent work	Qwen 3.5 27B	24GB
Fast responses	Qwen 3.5 14B	16GB
Limited VRAM	Qwen 3.5 8B	8GB
Experimental	Gemma 4 27B	24GB

Common Pitfalls

Mismatching context lengths between Ollama and Hermes
Assuming all models support tool calling equally well
Not setting max iterations appropriate for local model speed
Expecting frontier-level reliability from smaller models

Community Feedback Summary

Positive:

"Hermes agent already works way way better than Open Claw and it actually works pretty well locally"
Better local model support than alternatives

Challenges:

Tool calling reliability varies by model
Configuration complexity for beginners
Token overhead still applies (13.9K tokens per call)

3.4 KiB Raw Blame History