mid_model_research/hermes/feedback/frontier/claude-sonnet-feedback.md

# Claude Sonnet Feedback for Hermes Agent

**Source reference:** GitHub issues, community discussions, official docs

---

## Claude Sonnet 4.5/4.6 - Primary Recommendation

**Status:** Excellent performance, commonly used as default

### Token Usage Reality Check

**Source:** https://hermes-agent.ai/blog/hermes-agent-token-overhead

| Scenario | API Calls | Est. Cost (Sonnet 4.5) |
|----------|-----------|------------------------|
| Simple bug fix | 20 | ~$6 |
| Feature implementation | 100 | ~$34 |
| Large refactor | 500 | ~$187 |
| Full project build | 1,000 | ~$405 |

### Real-World Usage Example

**Source:** GitHub Issue #4379

**Single Evening Deployment (3 Active Sessions):**
| Session | Platform | Messages | Est. API Calls | Est. Input Tokens |
|---------|----------|----------|----------------|-------------------|
| Chat session | Telegram | 168 | ~84 | ~1.6M |
| Group chat | WhatsApp | 122 | ~61 | ~1.2M |
| Group chat | WhatsApp | 64 | ~32 | ~574K |
| **Total** | | **354** | **~207** | **~3.9M** |

---

## Token Overhead Analysis (All Models)

**Critical Finding:** 73% of every API call is fixed overhead (~13.9K tokens)

| Component | Tokens | % of Request |
|-----------|--------|--------------|
| Tool definitions (31 tools) | 8,759 | 46.1% |
| System prompt (SOUL.md + skills) | 5,176 | 27.2% |
| Messages (conversation) | 3,000-8,775 | 26.7% avg |
| **Total per request** | **~17,000-23,000** | |

**Impact:** This overhead is constant regardless of using Sonnet, Haiku, Llama, or any OpenRouter model.

---

## Performance Comparison

**Source:** https://www.buildmvpfast.com/blog/hermes-agent-v04-open-source-agent-infrastructure-2026

> "One developer reported that a task taking OpenClaw 50+ tool calls and steps took Hermes 5 correct tool calls and finished 2.5 minutes faster."

---

## Best Practices for Cost Management

### 1. Use Cheaper Models for Routine Tasks

Reserve Claude/GPT-4 for complex reasoning only:
- File organization → Use Kimi, MiniMax, DeepSeek
- Simple responses → Budget models
- Complex architecture → Claude Sonnet

### 2. Enable Caching (Where Available)

| Provider | Cache Support | Discount |
|----------|--------------|----------|
| DeepSeek | 90% off | Best option |
| Kimi K2.5 | 75% off | Good option |
| Anthropic | Full | Cache markers visible |
| OpenRouter | Partial | Depends on upstream |
| Gemini/GLM | None | Full price |

### 3. Short Sessions

Start fresh for unrelated tasks:
```bash
hermes --fresh
```

---

## User Experience Feedback

### Positive

- Excellent tool calling reliability
- Strong reasoning for complex multi-step tasks
- Good context understanding

### Cost Concerns

**Quote from Reddit user:**
> "4 million tokens in 2 hours of light usage" — Reddit user who quit

**High-token triggers:**
- Terminal tool spawning
- Browser automation with screenshots
- Complex code execution with large file reads

---

## Configuration Tips

### Auxiliary Vision Model

For vision tasks, consider using a cheaper model:
```yaml
auxiliary:
  vision:
    provider: "openrouter"
    model: "google/gemini-2.5-flash"
```

Or use Codex for vision (ChatGPT Pro/Plus):
```yaml
auxiliary:
  vision:
    provider: "codex"  # Uses ChatGPT OAuth token
```

---

## Summary

Claude Sonnet provides excellent performance with Hermes Agent but users should be aware of:
1. Fixed 13.9K token overhead per request
2. Costs can accumulate quickly with active usage
3. Best used selectively for complex tasks
4. Consider cheaper alternatives for routine work