Harnesses under analysis: - opencode (Go-based coding agent) - pi (minimal terminal coding harness by Mario Zechner) - hermes (Nous Research agent) - forgecode (AI pair programmer with sub-agents) Each harness folder contains: - repo/: Source code from respective repositories - feedback/localllm/: Community feedback for local/smaller models - feedback/frontier/: Community feedback for frontier models Research focus: Tool handling, skills systems, prompt engineering, context management, and best practices for smaller/local models.
3.5 KiB
Claude Sonnet Feedback for Hermes Agent
Source reference: GitHub issues, community discussions, official docs
Claude Sonnet 4.5/4.6 - Primary Recommendation
Status: Excellent performance, commonly used as default
Token Usage Reality Check
Source: https://hermes-agent.ai/blog/hermes-agent-token-overhead
| Scenario | API Calls | Est. Cost (Sonnet 4.5) |
|---|---|---|
| Simple bug fix | 20 | ~$6 |
| Feature implementation | 100 | ~$34 |
| Large refactor | 500 | ~$187 |
| Full project build | 1,000 | ~$405 |
Real-World Usage Example
Source: GitHub Issue #4379
Single Evening Deployment (3 Active Sessions):
| Session | Platform | Messages | Est. API Calls | Est. Input Tokens |
|---|---|---|---|---|
| Chat session | Telegram | 168 | ~84 | ~1.6M |
| Group chat | 122 | ~61 | ~1.2M | |
| Group chat | 64 | ~32 | ~574K | |
| Total | 354 | ~207 | ~3.9M |
Token Overhead Analysis (All Models)
Critical Finding: 73% of every API call is fixed overhead (~13.9K tokens)
| Component | Tokens | % of Request |
|---|---|---|
| Tool definitions (31 tools) | 8,759 | 46.1% |
| System prompt (SOUL.md + skills) | 5,176 | 27.2% |
| Messages (conversation) | 3,000-8,775 | 26.7% avg |
| Total per request | ~17,000-23,000 |
Impact: This overhead is constant regardless of using Sonnet, Haiku, Llama, or any OpenRouter model.
Performance Comparison
Source: https://www.buildmvpfast.com/blog/hermes-agent-v04-open-source-agent-infrastructure-2026
"One developer reported that a task taking OpenClaw 50+ tool calls and steps took Hermes 5 correct tool calls and finished 2.5 minutes faster."
Best Practices for Cost Management
1. Use Cheaper Models for Routine Tasks
Reserve Claude/GPT-4 for complex reasoning only:
- File organization → Use Kimi, MiniMax, DeepSeek
- Simple responses → Budget models
- Complex architecture → Claude Sonnet
2. Enable Caching (Where Available)
| Provider | Cache Support | Discount |
|---|---|---|
| DeepSeek | 90% off | Best option |
| Kimi K2.5 | 75% off | Good option |
| Anthropic | Full | Cache markers visible |
| OpenRouter | Partial | Depends on upstream |
| Gemini/GLM | None | Full price |
3. Short Sessions
Start fresh for unrelated tasks:
hermes --fresh
User Experience Feedback
Positive
- Excellent tool calling reliability
- Strong reasoning for complex multi-step tasks
- Good context understanding
Cost Concerns
Quote from Reddit user:
"4 million tokens in 2 hours of light usage" — Reddit user who quit
High-token triggers:
- Terminal tool spawning
- Browser automation with screenshots
- Complex code execution with large file reads
Configuration Tips
Auxiliary Vision Model
For vision tasks, consider using a cheaper model:
auxiliary:
vision:
provider: "openrouter"
model: "google/gemini-2.5-flash"
Or use Codex for vision (ChatGPT Pro/Plus):
auxiliary:
vision:
provider: "codex" # Uses ChatGPT OAuth token
Summary
Claude Sonnet provides excellent performance with Hermes Agent but users should be aware of:
- Fixed 13.9K token overhead per request
- Costs can accumulate quickly with active usage
- Best used selectively for complex tasks
- Consider cheaper alternatives for routine work