mid_model_research/hermes/feedback/localllm/general-local-llm-feedback.md

# General Local LLM Feedback for Hermes Agent

**Collection Date:** 2026-04-09
**Sources:** Reddit r/LocalLLaMA, r/LocalLLM, GitHub issues, blog posts, community discussions

---

## Overall Assessment

Hermes Agent is widely reported to work "way better" with local models than OpenClaw. However, users face challenges with configuration complexity and model selection.

---

## Positive Feedback

### Better Than OpenClaw for Local Models

**Source:** https://www.reddit.com/r/LocalLLM/comments/1rye221/anyone_working_with_hermes_agent/

> "its worknig better for me than openclaw, this i mean with local models, when i use openclaw i cant even load up 4b models, i am not sure why but i decided to see if the same problem would persist with hermes and i dint get this issue."

**Source:** https://www.reddit.com/r/LocalLLaMA/comments/1rwhi2h/running_hermes_agent_locally_with_lm_studio/

> "This Hermes agent already works way way better than Open Claw and it actually works pretty well locally. I have to be super careful about exposing this to the outside world because the model is not smart enough, probably, to catch sophisticated..."

### Architecture Appreciation

**Source:** https://www.reddit.com/r/LocalLLM/comments/1scglgq/i_looked_into_hermes_agent_architecture_to_dig/

> "It identified 11 websites from pure text and hit 60% testing WebArena tasks without tuning"

---

## Challenges and Issues

### Tool Calling Reliability

**Issue:** Models work initially but forget which tools to use after first call

**Affected:** Smaller models (4B, 7B range)

> "tool calls not always work i use ollama and qwen3.5:4b qwen2.5:7b and they all tool call once than they forget which one to use"

### Context Management Confusion

**Source:** https://www.reddit.com/r/LocalLLM/comments/1sc82o8/hermesagent_what_is_this_message_about/

> "Context exceeded your setting. Either your Hermes context or your llm server context setting for that particular model. By default context is usually set to something comically low."

### System Prompt Size Concerns

**Source:** https://www.reddit.com/r/LocalLLaMA/comments/1rwhi2h/running_hermes_agent_locally_with_lm_studio/

> "Hermes has a huge system prompt. When I try to run it with Qwen-3.5 35B it's difficult..."

---

## Model-Specific Feedback

### Recommended for Local Use

1. **Qwen 3.5 27B** - Best overall performance
   - Requires: 24GB+ VRAM
   - Speed: ~25 t/s with proper quantization
   - Tool use: Excellent

2. **Qwen 3.5 14B** - Good balance
   - Requires: 16GB VRAM
   - Decent tool use reliability

3. **Qwen 3.5 8B** - Minimum viable
   - Requires: 8GB VRAM
   - Tool use may be inconsistent

### Not Recommended

- Very small models (4B and below) for complex agent tasks
- Models without good tool calling fine-tuning

---

## Token Overhead Impact on Local Models

**Critical Issue:** Even local models face 13.9K token overhead per request

**Source:** GitHub Issue #4379

| Component | Tokens |
|-----------|--------|
| Tool definitions (31 tools) | 8,759 |
| System prompt | 5,176 |
| Fixed overhead | ~13,935 |

**Impact:** Local models with smaller context windows hit limits quickly due to this overhead.

---

## Community Suggestions

1. **Better documentation** for local model setup
2. **Recommended model list** with VRAM requirements
3. **Tool calling reliability benchmarks** by model size
4. **Reduced toolset option** for resource-constrained setups
5. **Better context management guidance**

---

## Summary Table

| Aspect | Rating | Notes |
|--------|--------|-------|
| Local model support | ⭐⭐⭐⭐⭐ | Better than alternatives |
| Setup ease | ⭐⭐⭐ | Requires technical knowledge |
| Tool calling (8B+) | ⭐⭐⭐⭐ | Good with right models |
| Tool calling (4B) | ⭐⭐ | Inconsistent |
| Documentation | ⭐⭐⭐ | Improving but gaps remain |
| Community support | ⭐⭐⭐⭐⭐ | Active and helpful |