Files

T

sleepy 51123212c4 Initial commit: coding harness feedback analysis

Harnesses under analysis:
- opencode (Go-based coding agent)
- pi (minimal terminal coding harness by Mario Zechner)
- hermes (Nous Research agent)
- forgecode (AI pair programmer with sub-agents)

Each harness folder contains:
- repo/: Source code from respective repositories
- feedback/localllm/: Community feedback for local/smaller models
- feedback/frontier/: Community feedback for frontier models

Research focus: Tool handling, skills systems, prompt engineering,
context management, and best practices for smaller/local models.

2026-04-09 15:13:45 +02:00

3.8 KiB

Raw Blame History

General Local LLM Feedback for Hermes Agent

Collection Date: 2026-04-09
Sources: Reddit r/LocalLLaMA, r/LocalLLM, GitHub issues, blog posts, community discussions

Overall Assessment

Hermes Agent is widely reported to work "way better" with local models than OpenClaw. However, users face challenges with configuration complexity and model selection.

Positive Feedback

Better Than OpenClaw for Local Models

Source: https://www.reddit.com/r/LocalLLM/comments/1rye221/anyone_working_with_hermes_agent/

"its worknig better for me than openclaw, this i mean with local models, when i use openclaw i cant even load up 4b models, i am not sure why but i decided to see if the same problem would persist with hermes and i dint get this issue."

Source: https://www.reddit.com/r/LocalLLaMA/comments/1rwhi2h/running_hermes_agent_locally_with_lm_studio/

"This Hermes agent already works way way better than Open Claw and it actually works pretty well locally. I have to be super careful about exposing this to the outside world because the model is not smart enough, probably, to catch sophisticated..."

Architecture Appreciation

Source: https://www.reddit.com/r/LocalLLM/comments/1scglgq/i_looked_into_hermes_agent_architecture_to_dig/

"It identified 11 websites from pure text and hit 60% testing WebArena tasks without tuning"

Challenges and Issues

Tool Calling Reliability

Issue: Models work initially but forget which tools to use after first call

Affected: Smaller models (4B, 7B range)

"tool calls not always work i use ollama and qwen3.5:4b qwen2.5:7b and they all tool call once than they forget which one to use"

Context Management Confusion

Source: https://www.reddit.com/r/LocalLLM/comments/1sc82o8/hermesagent_what_is_this_message_about/

"Context exceeded your setting. Either your Hermes context or your llm server context setting for that particular model. By default context is usually set to something comically low."

System Prompt Size Concerns

Source: https://www.reddit.com/r/LocalLLaMA/comments/1rwhi2h/running_hermes_agent_locally_with_lm_studio/

"Hermes has a huge system prompt. When I try to run it with Qwen-3.5 35B it's difficult..."

Model-Specific Feedback

Recommended for Local Use

Qwen 3.5 27B - Best overall performance
- Requires: 24GB+ VRAM
- Speed: ~25 t/s with proper quantization
- Tool use: Excellent
Qwen 3.5 14B - Good balance
- Requires: 16GB VRAM
- Decent tool use reliability
Qwen 3.5 8B - Minimum viable
- Requires: 8GB VRAM
- Tool use may be inconsistent

Not Recommended

Very small models (4B and below) for complex agent tasks
Models without good tool calling fine-tuning

Token Overhead Impact on Local Models

Critical Issue: Even local models face 13.9K token overhead per request

Source: GitHub Issue #4379

Component	Tokens
Tool definitions (31 tools)	8,759
System prompt	5,176
Fixed overhead	~13,935

Impact: Local models with smaller context windows hit limits quickly due to this overhead.

Community Suggestions

Better documentation for local model setup
Recommended model list with VRAM requirements
Tool calling reliability benchmarks by model size
Reduced toolset option for resource-constrained setups
Better context management guidance

Summary Table

Aspect	Rating	Notes
Local model support	⭐⭐⭐⭐⭐	Better than alternatives
Setup ease	⭐⭐⭐	Requires technical knowledge
Tool calling (8B+)	⭐⭐⭐⭐	Good with right models
Tool calling (4B)	⭐⭐	Inconsistent
Documentation	⭐⭐⭐	Improving but gaps remain
Community support	⭐⭐⭐⭐⭐	Active and helpful

3.8 KiB Raw Blame History