Files
mid_model_research/hermes/feedback/frontier/claude-sonnet-feedback.md
T
sleepy 51123212c4 Initial commit: coding harness feedback analysis
Harnesses under analysis:
- opencode (Go-based coding agent)
- pi (minimal terminal coding harness by Mario Zechner)
- hermes (Nous Research agent)
- forgecode (AI pair programmer with sub-agents)

Each harness folder contains:
- repo/: Source code from respective repositories
- feedback/localllm/: Community feedback for local/smaller models
- feedback/frontier/: Community feedback for frontier models

Research focus: Tool handling, skills systems, prompt engineering,
context management, and best practices for smaller/local models.
2026-04-09 15:13:45 +02:00

135 lines
3.5 KiB
Markdown

# Claude Sonnet Feedback for Hermes Agent
**Source reference:** GitHub issues, community discussions, official docs
---
## Claude Sonnet 4.5/4.6 - Primary Recommendation
**Status:** Excellent performance, commonly used as default
### Token Usage Reality Check
**Source:** https://hermes-agent.ai/blog/hermes-agent-token-overhead
| Scenario | API Calls | Est. Cost (Sonnet 4.5) |
|----------|-----------|------------------------|
| Simple bug fix | 20 | ~$6 |
| Feature implementation | 100 | ~$34 |
| Large refactor | 500 | ~$187 |
| Full project build | 1,000 | ~$405 |
### Real-World Usage Example
**Source:** GitHub Issue #4379
**Single Evening Deployment (3 Active Sessions):**
| Session | Platform | Messages | Est. API Calls | Est. Input Tokens |
|---------|----------|----------|----------------|-------------------|
| Chat session | Telegram | 168 | ~84 | ~1.6M |
| Group chat | WhatsApp | 122 | ~61 | ~1.2M |
| Group chat | WhatsApp | 64 | ~32 | ~574K |
| **Total** | | **354** | **~207** | **~3.9M** |
---
## Token Overhead Analysis (All Models)
**Critical Finding:** 73% of every API call is fixed overhead (~13.9K tokens)
| Component | Tokens | % of Request |
|-----------|--------|--------------|
| Tool definitions (31 tools) | 8,759 | 46.1% |
| System prompt (SOUL.md + skills) | 5,176 | 27.2% |
| Messages (conversation) | 3,000-8,775 | 26.7% avg |
| **Total per request** | **~17,000-23,000** | |
**Impact:** This overhead is constant regardless of using Sonnet, Haiku, Llama, or any OpenRouter model.
---
## Performance Comparison
**Source:** https://www.buildmvpfast.com/blog/hermes-agent-v04-open-source-agent-infrastructure-2026
> "One developer reported that a task taking OpenClaw 50+ tool calls and steps took Hermes 5 correct tool calls and finished 2.5 minutes faster."
---
## Best Practices for Cost Management
### 1. Use Cheaper Models for Routine Tasks
Reserve Claude/GPT-4 for complex reasoning only:
- File organization → Use Kimi, MiniMax, DeepSeek
- Simple responses → Budget models
- Complex architecture → Claude Sonnet
### 2. Enable Caching (Where Available)
| Provider | Cache Support | Discount |
|----------|--------------|----------|
| DeepSeek | 90% off | Best option |
| Kimi K2.5 | 75% off | Good option |
| Anthropic | Full | Cache markers visible |
| OpenRouter | Partial | Depends on upstream |
| Gemini/GLM | None | Full price |
### 3. Short Sessions
Start fresh for unrelated tasks:
```bash
hermes --fresh
```
---
## User Experience Feedback
### Positive
- Excellent tool calling reliability
- Strong reasoning for complex multi-step tasks
- Good context understanding
### Cost Concerns
**Quote from Reddit user:**
> "4 million tokens in 2 hours of light usage" — Reddit user who quit
**High-token triggers:**
- Terminal tool spawning
- Browser automation with screenshots
- Complex code execution with large file reads
---
## Configuration Tips
### Auxiliary Vision Model
For vision tasks, consider using a cheaper model:
```yaml
auxiliary:
vision:
provider: "openrouter"
model: "google/gemini-2.5-flash"
```
Or use Codex for vision (ChatGPT Pro/Plus):
```yaml
auxiliary:
vision:
provider: "codex" # Uses ChatGPT OAuth token
```
---
## Summary
Claude Sonnet provides excellent performance with Hermes Agent but users should be aware of:
1. Fixed 13.9K token overhead per request
2. Costs can accumulate quickly with active usage
3. Best used selectively for complex tasks
4. Consider cheaper alternatives for routine work