Files
mid_model_research/hermes/feedback/localllm/general-local-llm-feedback.md
T
sleepy 51123212c4 Initial commit: coding harness feedback analysis
Harnesses under analysis:
- opencode (Go-based coding agent)
- pi (minimal terminal coding harness by Mario Zechner)
- hermes (Nous Research agent)
- forgecode (AI pair programmer with sub-agents)

Each harness folder contains:
- repo/: Source code from respective repositories
- feedback/localllm/: Community feedback for local/smaller models
- feedback/frontier/: Community feedback for frontier models

Research focus: Tool handling, skills systems, prompt engineering,
context management, and best practices for smaller/local models.
2026-04-09 15:13:45 +02:00

3.8 KiB

General Local LLM Feedback for Hermes Agent

Collection Date: 2026-04-09
Sources: Reddit r/LocalLLaMA, r/LocalLLM, GitHub issues, blog posts, community discussions


Overall Assessment

Hermes Agent is widely reported to work "way better" with local models than OpenClaw. However, users face challenges with configuration complexity and model selection.


Positive Feedback

Better Than OpenClaw for Local Models

Source: https://www.reddit.com/r/LocalLLM/comments/1rye221/anyone_working_with_hermes_agent/

"its worknig better for me than openclaw, this i mean with local models, when i use openclaw i cant even load up 4b models, i am not sure why but i decided to see if the same problem would persist with hermes and i dint get this issue."

Source: https://www.reddit.com/r/LocalLLaMA/comments/1rwhi2h/running_hermes_agent_locally_with_lm_studio/

"This Hermes agent already works way way better than Open Claw and it actually works pretty well locally. I have to be super careful about exposing this to the outside world because the model is not smart enough, probably, to catch sophisticated..."

Architecture Appreciation

Source: https://www.reddit.com/r/LocalLLM/comments/1scglgq/i_looked_into_hermes_agent_architecture_to_dig/

"It identified 11 websites from pure text and hit 60% testing WebArena tasks without tuning"


Challenges and Issues

Tool Calling Reliability

Issue: Models work initially but forget which tools to use after first call

Affected: Smaller models (4B, 7B range)

"tool calls not always work i use ollama and qwen3.5:4b qwen2.5:7b and they all tool call once than they forget which one to use"

Context Management Confusion

Source: https://www.reddit.com/r/LocalLLM/comments/1sc82o8/hermesagent_what_is_this_message_about/

"Context exceeded your setting. Either your Hermes context or your llm server context setting for that particular model. By default context is usually set to something comically low."

System Prompt Size Concerns

Source: https://www.reddit.com/r/LocalLLaMA/comments/1rwhi2h/running_hermes_agent_locally_with_lm_studio/

"Hermes has a huge system prompt. When I try to run it with Qwen-3.5 35B it's difficult..."


Model-Specific Feedback

  1. Qwen 3.5 27B - Best overall performance

    • Requires: 24GB+ VRAM
    • Speed: ~25 t/s with proper quantization
    • Tool use: Excellent
  2. Qwen 3.5 14B - Good balance

    • Requires: 16GB VRAM
    • Decent tool use reliability
  3. Qwen 3.5 8B - Minimum viable

    • Requires: 8GB VRAM
    • Tool use may be inconsistent
  • Very small models (4B and below) for complex agent tasks
  • Models without good tool calling fine-tuning

Token Overhead Impact on Local Models

Critical Issue: Even local models face 13.9K token overhead per request

Source: GitHub Issue #4379

Component Tokens
Tool definitions (31 tools) 8,759
System prompt 5,176
Fixed overhead ~13,935

Impact: Local models with smaller context windows hit limits quickly due to this overhead.


Community Suggestions

  1. Better documentation for local model setup
  2. Recommended model list with VRAM requirements
  3. Tool calling reliability benchmarks by model size
  4. Reduced toolset option for resource-constrained setups
  5. Better context management guidance

Summary Table

Aspect Rating Notes
Local model support Better than alternatives
Setup ease Requires technical knowledge
Tool calling (8B+) Good with right models
Tool calling (4B) Inconsistent
Documentation Improving but gaps remain
Community support Active and helpful