51123212c4
Harnesses under analysis: - opencode (Go-based coding agent) - pi (minimal terminal coding harness by Mario Zechner) - hermes (Nous Research agent) - forgecode (AI pair programmer with sub-agents) Each harness folder contains: - repo/: Source code from respective repositories - feedback/localllm/: Community feedback for local/smaller models - feedback/frontier/: Community feedback for frontier models Research focus: Tool handling, skills systems, prompt engineering, context management, and best practices for smaller/local models.
135 lines
3.5 KiB
Markdown
135 lines
3.5 KiB
Markdown
# Claude Sonnet Feedback for Hermes Agent
|
|
|
|
**Source reference:** GitHub issues, community discussions, official docs
|
|
|
|
---
|
|
|
|
## Claude Sonnet 4.5/4.6 - Primary Recommendation
|
|
|
|
**Status:** Excellent performance, commonly used as default
|
|
|
|
### Token Usage Reality Check
|
|
|
|
**Source:** https://hermes-agent.ai/blog/hermes-agent-token-overhead
|
|
|
|
| Scenario | API Calls | Est. Cost (Sonnet 4.5) |
|
|
|----------|-----------|------------------------|
|
|
| Simple bug fix | 20 | ~$6 |
|
|
| Feature implementation | 100 | ~$34 |
|
|
| Large refactor | 500 | ~$187 |
|
|
| Full project build | 1,000 | ~$405 |
|
|
|
|
### Real-World Usage Example
|
|
|
|
**Source:** GitHub Issue #4379
|
|
|
|
**Single Evening Deployment (3 Active Sessions):**
|
|
| Session | Platform | Messages | Est. API Calls | Est. Input Tokens |
|
|
|---------|----------|----------|----------------|-------------------|
|
|
| Chat session | Telegram | 168 | ~84 | ~1.6M |
|
|
| Group chat | WhatsApp | 122 | ~61 | ~1.2M |
|
|
| Group chat | WhatsApp | 64 | ~32 | ~574K |
|
|
| **Total** | | **354** | **~207** | **~3.9M** |
|
|
|
|
---
|
|
|
|
## Token Overhead Analysis (All Models)
|
|
|
|
**Critical Finding:** 73% of every API call is fixed overhead (~13.9K tokens)
|
|
|
|
| Component | Tokens | % of Request |
|
|
|-----------|--------|--------------|
|
|
| Tool definitions (31 tools) | 8,759 | 46.1% |
|
|
| System prompt (SOUL.md + skills) | 5,176 | 27.2% |
|
|
| Messages (conversation) | 3,000-8,775 | 26.7% avg |
|
|
| **Total per request** | **~17,000-23,000** | |
|
|
|
|
**Impact:** This overhead is constant regardless of using Sonnet, Haiku, Llama, or any OpenRouter model.
|
|
|
|
---
|
|
|
|
## Performance Comparison
|
|
|
|
**Source:** https://www.buildmvpfast.com/blog/hermes-agent-v04-open-source-agent-infrastructure-2026
|
|
|
|
> "One developer reported that a task taking OpenClaw 50+ tool calls and steps took Hermes 5 correct tool calls and finished 2.5 minutes faster."
|
|
|
|
---
|
|
|
|
## Best Practices for Cost Management
|
|
|
|
### 1. Use Cheaper Models for Routine Tasks
|
|
|
|
Reserve Claude/GPT-4 for complex reasoning only:
|
|
- File organization → Use Kimi, MiniMax, DeepSeek
|
|
- Simple responses → Budget models
|
|
- Complex architecture → Claude Sonnet
|
|
|
|
### 2. Enable Caching (Where Available)
|
|
|
|
| Provider | Cache Support | Discount |
|
|
|----------|--------------|----------|
|
|
| DeepSeek | 90% off | Best option |
|
|
| Kimi K2.5 | 75% off | Good option |
|
|
| Anthropic | Full | Cache markers visible |
|
|
| OpenRouter | Partial | Depends on upstream |
|
|
| Gemini/GLM | None | Full price |
|
|
|
|
### 3. Short Sessions
|
|
|
|
Start fresh for unrelated tasks:
|
|
```bash
|
|
hermes --fresh
|
|
```
|
|
|
|
---
|
|
|
|
## User Experience Feedback
|
|
|
|
### Positive
|
|
|
|
- Excellent tool calling reliability
|
|
- Strong reasoning for complex multi-step tasks
|
|
- Good context understanding
|
|
|
|
### Cost Concerns
|
|
|
|
**Quote from Reddit user:**
|
|
> "4 million tokens in 2 hours of light usage" — Reddit user who quit
|
|
|
|
**High-token triggers:**
|
|
- Terminal tool spawning
|
|
- Browser automation with screenshots
|
|
- Complex code execution with large file reads
|
|
|
|
---
|
|
|
|
## Configuration Tips
|
|
|
|
### Auxiliary Vision Model
|
|
|
|
For vision tasks, consider using a cheaper model:
|
|
```yaml
|
|
auxiliary:
|
|
vision:
|
|
provider: "openrouter"
|
|
model: "google/gemini-2.5-flash"
|
|
```
|
|
|
|
Or use Codex for vision (ChatGPT Pro/Plus):
|
|
```yaml
|
|
auxiliary:
|
|
vision:
|
|
provider: "codex" # Uses ChatGPT OAuth token
|
|
```
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
Claude Sonnet provides excellent performance with Hermes Agent but users should be aware of:
|
|
1. Fixed 13.9K token overhead per request
|
|
2. Costs can accumulate quickly with active usage
|
|
3. Best used selectively for complex tasks
|
|
4. Consider cheaper alternatives for routine work
|