Files

T

sleepy 51123212c4 Initial commit: coding harness feedback analysis

Harnesses under analysis:
- opencode (Go-based coding agent)
- pi (minimal terminal coding harness by Mario Zechner)
- hermes (Nous Research agent)
- forgecode (AI pair programmer with sub-agents)

Each harness folder contains:
- repo/: Source code from respective repositories
- feedback/localllm/: Community feedback for local/smaller models
- feedback/frontier/: Community feedback for frontier models

Research focus: Tool handling, skills systems, prompt engineering,
context management, and best practices for smaller/local models.

2026-04-09 15:13:45 +02:00

3.5 KiB

Raw Blame History

Claude Sonnet Feedback for Hermes Agent

Source reference: GitHub issues, community discussions, official docs

Claude Sonnet 4.5/4.6 - Primary Recommendation

Status: Excellent performance, commonly used as default

Token Usage Reality Check

Source: https://hermes-agent.ai/blog/hermes-agent-token-overhead

Scenario	API Calls	Est. Cost (Sonnet 4.5)
Simple bug fix	20	~$6
Feature implementation	100	~$34
Large refactor	500	~$187
Full project build	1,000	~$405

Real-World Usage Example

Source: GitHub Issue #4379

Single Evening Deployment (3 Active Sessions):

Session	Platform	Messages	Est. API Calls	Est. Input Tokens
Chat session	Telegram	168	~84	~1.6M
Group chat	WhatsApp	122	~61	~1.2M
Group chat	WhatsApp	64	~32	~574K
Total		354	~207	~3.9M

Token Overhead Analysis (All Models)

Critical Finding: 73% of every API call is fixed overhead (~13.9K tokens)

Component	Tokens	% of Request
Tool definitions (31 tools)	8,759	46.1%
System prompt (SOUL.md + skills)	5,176	27.2%
Messages (conversation)	3,000-8,775	26.7% avg
Total per request	~17,000-23,000

Impact: This overhead is constant regardless of using Sonnet, Haiku, Llama, or any OpenRouter model.

Performance Comparison

Source: https://www.buildmvpfast.com/blog/hermes-agent-v04-open-source-agent-infrastructure-2026

"One developer reported that a task taking OpenClaw 50+ tool calls and steps took Hermes 5 correct tool calls and finished 2.5 minutes faster."

Best Practices for Cost Management

1. Use Cheaper Models for Routine Tasks

Reserve Claude/GPT-4 for complex reasoning only:

File organization → Use Kimi, MiniMax, DeepSeek
Simple responses → Budget models
Complex architecture → Claude Sonnet

2. Enable Caching (Where Available)

Provider	Cache Support	Discount
DeepSeek	90% off	Best option
Kimi K2.5	75% off	Good option
Anthropic	Full	Cache markers visible
OpenRouter	Partial	Depends on upstream
Gemini/GLM	None	Full price

3. Short Sessions

Start fresh for unrelated tasks:

hermes --fresh

User Experience Feedback

Positive

Excellent tool calling reliability
Strong reasoning for complex multi-step tasks
Good context understanding

Cost Concerns

Quote from Reddit user:

"4 million tokens in 2 hours of light usage" — Reddit user who quit

High-token triggers:

Terminal tool spawning
Browser automation with screenshots
Complex code execution with large file reads

Configuration Tips

Auxiliary Vision Model

For vision tasks, consider using a cheaper model:

auxiliary:
  vision:
    provider: "openrouter"
    model: "google/gemini-2.5-flash"

Or use Codex for vision (ChatGPT Pro/Plus):

auxiliary:
  vision:
    provider: "codex"  # Uses ChatGPT OAuth token

Summary

Claude Sonnet provides excellent performance with Hermes Agent but users should be aware of:

Fixed 13.9K token overhead per request
Costs can accumulate quickly with active usage
Best used selectively for complex tasks
Consider cheaper alternatives for routine work

3.5 KiB Raw Blame History