Files
mid_model_research/hermes/feedback/frontier/claude-sonnet-feedback.md
T
sleepy 51123212c4 Initial commit: coding harness feedback analysis
Harnesses under analysis:
- opencode (Go-based coding agent)
- pi (minimal terminal coding harness by Mario Zechner)
- hermes (Nous Research agent)
- forgecode (AI pair programmer with sub-agents)

Each harness folder contains:
- repo/: Source code from respective repositories
- feedback/localllm/: Community feedback for local/smaller models
- feedback/frontier/: Community feedback for frontier models

Research focus: Tool handling, skills systems, prompt engineering,
context management, and best practices for smaller/local models.
2026-04-09 15:13:45 +02:00

3.5 KiB

Claude Sonnet Feedback for Hermes Agent

Source reference: GitHub issues, community discussions, official docs


Claude Sonnet 4.5/4.6 - Primary Recommendation

Status: Excellent performance, commonly used as default

Token Usage Reality Check

Source: https://hermes-agent.ai/blog/hermes-agent-token-overhead

Scenario API Calls Est. Cost (Sonnet 4.5)
Simple bug fix 20 ~$6
Feature implementation 100 ~$34
Large refactor 500 ~$187
Full project build 1,000 ~$405

Real-World Usage Example

Source: GitHub Issue #4379

Single Evening Deployment (3 Active Sessions):

Session Platform Messages Est. API Calls Est. Input Tokens
Chat session Telegram 168 ~84 ~1.6M
Group chat WhatsApp 122 ~61 ~1.2M
Group chat WhatsApp 64 ~32 ~574K
Total 354 ~207 ~3.9M

Token Overhead Analysis (All Models)

Critical Finding: 73% of every API call is fixed overhead (~13.9K tokens)

Component Tokens % of Request
Tool definitions (31 tools) 8,759 46.1%
System prompt (SOUL.md + skills) 5,176 27.2%
Messages (conversation) 3,000-8,775 26.7% avg
Total per request ~17,000-23,000

Impact: This overhead is constant regardless of using Sonnet, Haiku, Llama, or any OpenRouter model.


Performance Comparison

Source: https://www.buildmvpfast.com/blog/hermes-agent-v04-open-source-agent-infrastructure-2026

"One developer reported that a task taking OpenClaw 50+ tool calls and steps took Hermes 5 correct tool calls and finished 2.5 minutes faster."


Best Practices for Cost Management

1. Use Cheaper Models for Routine Tasks

Reserve Claude/GPT-4 for complex reasoning only:

  • File organization → Use Kimi, MiniMax, DeepSeek
  • Simple responses → Budget models
  • Complex architecture → Claude Sonnet

2. Enable Caching (Where Available)

Provider Cache Support Discount
DeepSeek 90% off Best option
Kimi K2.5 75% off Good option
Anthropic Full Cache markers visible
OpenRouter Partial Depends on upstream
Gemini/GLM None Full price

3. Short Sessions

Start fresh for unrelated tasks:

hermes --fresh

User Experience Feedback

Positive

  • Excellent tool calling reliability
  • Strong reasoning for complex multi-step tasks
  • Good context understanding

Cost Concerns

Quote from Reddit user:

"4 million tokens in 2 hours of light usage" — Reddit user who quit

High-token triggers:

  • Terminal tool spawning
  • Browser automation with screenshots
  • Complex code execution with large file reads

Configuration Tips

Auxiliary Vision Model

For vision tasks, consider using a cheaper model:

auxiliary:
  vision:
    provider: "openrouter"
    model: "google/gemini-2.5-flash"

Or use Codex for vision (ChatGPT Pro/Plus):

auxiliary:
  vision:
    provider: "codex"  # Uses ChatGPT OAuth token

Summary

Claude Sonnet provides excellent performance with Hermes Agent but users should be aware of:

  1. Fixed 13.9K token overhead per request
  2. Costs can accumulate quickly with active usage
  3. Best used selectively for complex tasks
  4. Consider cheaper alternatives for routine work