Files

T

sleepy 51123212c4 Initial commit: coding harness feedback analysis

Harnesses under analysis:
- opencode (Go-based coding agent)
- pi (minimal terminal coding harness by Mario Zechner)
- hermes (Nous Research agent)
- forgecode (AI pair programmer with sub-agents)

Each harness folder contains:
- repo/: Source code from respective repositories
- feedback/localllm/: Community feedback for local/smaller models
- feedback/frontier/: Community feedback for frontier models

Research focus: Tool handling, skills systems, prompt engineering,
context management, and best practices for smaller/local models.

2026-04-09 15:13:45 +02:00

6.6 KiB

Raw Blame History

Feature Feedback and User Experience

Collection Date: 2026-04-09
Sources: GitHub issues, blog posts, community discussions, documentation

Skills System

Positive Feedback

Self-Improvement Loop:

"The agent can transform what it learns into reusable skills, improve them through experience, store useful information, and even search for previous conversations."

Progressive Disclosure:

Level 0: Skill names/descriptions (~3,000 tokens)
Level 1: Full skill content when needed
Level 2: Specific reference files

Skill Creation:

Auto-generated after complex tasks (5+ tool calls)
Can be hand-written
Installable from Skills Hub
Shareable via agentskills.io format

Community Contributions

Awesome Hermes Agent: https://github.com/0xNyk/awesome-hermes-agent

Curated list of skills, tools, integrations
Four plugins covering common operational needs
Inter-agent bridge for multiple Hermes instances
Hermes-skill-factory (auto-generates skills from workflows)

Memory System

Architecture

Three Layers:

Short-term - Recent context in conversation
Long-term - MEMORY.md (facts, conventions, lessons)
Episodic - SQLite FTS5 search across all sessions

Storage:

MEMORY.md (~2,200 chars) - Always in context
USER.md (~1,375 chars) - User preferences
~/.hermes/state.db - SQLite with full-text search

User Confusion Points

Source: https://vectorize.io/articles/hermes-agent-memory-not-working

"Memory is for critical facts that should always be in context. Session search is for 'did we discuss X last week?' queries where the agent needs to recall — it doesn't happen automatically before every response."

Common Misconception: Agent should automatically remember everything Reality: User must explicitly ask agent to remember: "Remember that my production database runs on port 5433"

Delegation and Subagents

Performance Benefits

"Use delegate_task with parallel subtasks. Each subagent runs independently with its own context, and only the final summaries come back — massively reducing your main conversation's token usage."

Best Practices

Set max_iterations lower for simple tasks (default: 50)
Be specific in goals - "Fix the TypeError in api/handlers.py line 47" not "Fix the bug"
Include file paths - Subagents don't know your project structure
Use for context isolation - Prevents main conversation bloat

Multi-Agent Architecture (Future)

Issue #344 Proposal:

L0: Current (exists today)
L1: Workflow engine
L2: Checkpointing and recovery
L3: Full orchestration

Cron and Scheduling

Use Cases

Examples:

"Every morning at 9am, check Hacker News for AI news and send me a summary on Telegram."

"Weekly dependency audit every Sunday at 6 AM"

Features

Output automatically delivered to configured platform
Job output saved to ~/.hermes/cron/output/<job-id>/<timestamp>.md
Test with /cron run <job_id> before scheduling

Limitations

Agent only sees script stdout
Background execution requires proper setup

Gateway and Messaging

Supported Platforms

Full List:

Telegram
Discord
Slack
WhatsApp
Signal
Email
SMS
Home Assistant
Matrix/Mattermost
DingTalk/Feishu/WeCom

Cross-Platform Continuity

"Instructions are given via Telegram in the morning, and progress is checked via Discord at night. It's seamless."

Voice Support

Voice memo transcription on all platforms
TTS output with /voice command
Discord voice channel support

Terminal Backends

Options

Local (default)
Docker (sandboxed)
SSH (remote server)
Daytona (serverless persistence)
Singularity
Modal (serverless, hibernates when idle)

Security

Container hardening with read-only root
Dropped capabilities
Namespace isolation
Dangerous command approval system

Browser and Vision

Browser Tools

Set:

browser_navigate
browser_click
browser_snapshot
browser_type
etc. (11 tools total)

Cost Impact:

Browser tools add ~1,258 tokens to every request (even when unused in messaging)
Screenshots + vision analysis are high-token operations

Vision Analysis

Supported:

Image URLs via vision_analyze
Image paste in CLI (with xclip/x11 forwarding)
Images via messaging platforms

Voice Mode

Features

STT: faster-whisper (local, free)
TTS: Microsoft Edge TTS (free)
Recording: Ctrl+B in CLI
Cross-platform: Works in Telegram, Discord, etc.

Comparison: Hermes vs OpenClaw

Hermes Advantages

Aspect	Winner	Reason
Personal companion	Hermes	Continuous learning, personalization
Repetitive task automation	Hermes	Skill learning adapts to workflows
Voice interaction	Hermes	Native voice support
Lightweight deployment	Hermes	20MB vs 200MB+
Signal support	Hermes	Better multi-platform
Local model support	Hermes	Works better with Ollama/llama.cpp

OpenClaw Advantages

Aspect	Winner	Reason
Multi-agent coordination	OpenClaw	Better fleet management
Browser automation	OpenClaw	More mature plugin ecosystem
Community/plugins	OpenClaw	307k stars vs 6k
MCP ecosystem	OpenClaw	More mature

Community Recommendation

"Use both. OpenClaw as the 'fleet commander' for multi-agent coordination, Hermes as your 'personal advisor' for one-on-one tasks."

User Experience Feedback

Positive

"Hermes optimizes for depth of learning. It is smaller, more opinionated, and built by a team that trains the underlying models."

"For repetitive workflows where agent improvement creates measurable value over time, Hermes is the stronger choice."

"It just works — installation to first conversation is minutes, not hours."

Areas for Improvement

Token overhead transparency - Users surprised by costs
Memory system education - Users expect automatic memory
Local model guidance - Need better model recommendations
Gateway debugging - Error messages can be cryptic
Migration experience - OpenClaw migration has rough edges

Summary

Strengths:

Self-improving skill system
Excellent multi-platform support
Strong memory architecture
Good local model support
Active development

Weaknesses:

Token overhead can surprise users
Some migration/tooling rough edges
Documentation gaps for advanced features
Memory system requires user education

6.6 KiB Raw Blame History

Feature Feedback and User Experience

Skills System

Positive Feedback

Community Contributions

Memory System

Architecture

User Confusion Points

Delegation and Subagents

Performance Benefits

Best Practices

Multi-Agent Architecture (Future)

Cron and Scheduling

Use Cases

Features

Limitations

Gateway and Messaging

Supported Platforms

Cross-Platform Continuity

Voice Support

Terminal Backends

Options

Security

Browser and Vision

Browser Tools

Vision Analysis

Voice Mode

Features

Comparison: Hermes vs OpenClaw

Hermes Advantages

OpenClaw Advantages

Community Recommendation

User Experience Feedback

Positive

Areas for Improvement

Summary

6.6 KiB

Raw Blame History