# Qwen Models Feedback for Hermes Agent

**Source reference:** Multiple Reddit r/LocalLLaMA posts, GitHub issues, community discussions

---

## Model: Qwen 3.5 (Various Sizes)

### Qwen 3.5 27B - Highly Recommended

**Hardware:** Dual 3090s with UD_5XL quant from Unsloth  
**Performance:** ~25 t/s at 32k context  
**Source:** https://www.reddit.com/r/LocalLLaMA/comments/1ro9lph/anybody_who_tried_hermesagent/

> "The go to model for intelligence on decent hardware is qwen 3.5 27B, if you have two 3090s, use the UD_5XL quant from unsloth - its amazing. You will get about 25 t/s with this one, at a contex size of 32k, which is perfect."

### Tool Calling Performance

**Issue:** Tool calls work once then model forgets which tool to use  
**Models affected:** Qwen 3.5 4B, Qwen 2.5 7B  
**Source:** https://www.reddit.com/r/LocalLLaMA/comments/1s4yy6o/best_model_for_hermesagent/

> "I use ollama and qwen3.5:4b qwen2.5:7b and they all tool call once than they forget which one to use any recomendations for other models?"

**User hardware:** 8GB VRAM

### Qwen vs Gemma 4 Comparison

**Source:** https://www.reddit.com/r/LocalLLaMA/comments/1scbpmo/so_qwen35_or_gemma_4/

> "For me Qwen is working significantly better for tool use with novel tools (things unlike what you'd expect in OpenCode or Claude Code). Gemma keeps duplicating tool calls for some reason."

> "Gemma is failing to complete the complex challenge which qwen can succeed at (24gb VRAM) it's just giving up and claiming it's succeeded when it hasn't."

---

## llama-server (llama.cpp) Compatibility Issue

**Issue #1071:** Critical bug with llama-server/Ollama backend

**Error:** `'dict' object has no attribute 'strip'` during tool call argument validation

**Environment:**
- OS: Windows 11 (llama-server) + Ubuntu/WSL2 (hermes-agent)
- Python: 3.11.15
- Hermes: v0.2.0
- Backend: llama-server with Qwen3.5-27B-Q4_K_M.gguf

**Root Cause:**
Hermes assumes `tc.function.arguments` is always a string, but llama-server sometimes returns it as a parsed dict. This is a known llama-server/Ollama behavior divergence from OpenAI spec.

**Fix:**
```python
if isinstance(args, (dict, list)):
    tc.function.arguments = json.dumps(args)
```

**Status:** User-submitted fix confirmed working

---

## Best Practices for Local Models

### Context Length Configuration

**Critical:** Match Ollama's `num_ctx` with Hermes config

> "Ollama users: If you set custom `num_ctx` (e.g., `ollama run --num_ctx 16384`), ensure matching context length in Hermes — Ollama's `/api/show` reports the model's *maximum* context, not the effective `num_ctx` configured."

**Source:** https://hermes-agent.nousresearch.com/docs/reference/faq

### Model Recommendations by VRAM

| VRAM | Recommended Model | Notes |
|------|------------------|-------|
| 8GB | Qwen 3.5 4B | Tool calling may be inconsistent |
| 24GB | Qwen 3.5 27B (Q4_K_M) | Excellent tool use, 25 t/s |
| 48GB+ | Qwen 3.5 27B UD_5XL | Best quality, ~25 t/s at 32k ctx |

---

## General Local Model Feedback

**Positive:**
- "Hermes agent already works way way better than Open Claw and it actually works pretty well locally"
- "I have to be super careful about exposing this to the outside world because the model is not smart enough, probably, to catch sophisticated..."

**Challenges:**
- Context exceeded errors common with default settings
- Need to manually configure context length to match model capabilities
- Tool calling reliability varies significantly by model size