Files
sleepy 51123212c4 Initial commit: coding harness feedback analysis
Harnesses under analysis:
- opencode (Go-based coding agent)
- pi (minimal terminal coding harness by Mario Zechner)
- hermes (Nous Research agent)
- forgecode (AI pair programmer with sub-agents)

Each harness folder contains:
- repo/: Source code from respective repositories
- feedback/localllm/: Community feedback for local/smaller models
- feedback/frontier/: Community feedback for frontier models

Research focus: Tool handling, skills systems, prompt engineering,
context management, and best practices for smaller/local models.
2026-04-09 15:13:45 +02:00

6.6 KiB

Feature Feedback and User Experience

Collection Date: 2026-04-09
Sources: GitHub issues, blog posts, community discussions, documentation


Skills System

Positive Feedback

Self-Improvement Loop:

"The agent can transform what it learns into reusable skills, improve them through experience, store useful information, and even search for previous conversations."

Progressive Disclosure:

  • Level 0: Skill names/descriptions (~3,000 tokens)
  • Level 1: Full skill content when needed
  • Level 2: Specific reference files

Skill Creation:

  • Auto-generated after complex tasks (5+ tool calls)
  • Can be hand-written
  • Installable from Skills Hub
  • Shareable via agentskills.io format

Community Contributions

Awesome Hermes Agent: https://github.com/0xNyk/awesome-hermes-agent

  • Curated list of skills, tools, integrations
  • Four plugins covering common operational needs
  • Inter-agent bridge for multiple Hermes instances
  • Hermes-skill-factory (auto-generates skills from workflows)

Memory System

Architecture

Three Layers:

  1. Short-term - Recent context in conversation
  2. Long-term - MEMORY.md (facts, conventions, lessons)
  3. Episodic - SQLite FTS5 search across all sessions

Storage:

  • MEMORY.md (~2,200 chars) - Always in context
  • USER.md (~1,375 chars) - User preferences
  • ~/.hermes/state.db - SQLite with full-text search

User Confusion Points

Source: https://vectorize.io/articles/hermes-agent-memory-not-working

"Memory is for critical facts that should always be in context. Session search is for 'did we discuss X last week?' queries where the agent needs to recall — it doesn't happen automatically before every response."

Common Misconception: Agent should automatically remember everything Reality: User must explicitly ask agent to remember: "Remember that my production database runs on port 5433"


Delegation and Subagents

Performance Benefits

"Use delegate_task with parallel subtasks. Each subagent runs independently with its own context, and only the final summaries come back — massively reducing your main conversation's token usage."

Best Practices

  1. Set max_iterations lower for simple tasks (default: 50)
  2. Be specific in goals - "Fix the TypeError in api/handlers.py line 47" not "Fix the bug"
  3. Include file paths - Subagents don't know your project structure
  4. Use for context isolation - Prevents main conversation bloat

Multi-Agent Architecture (Future)

Issue #344 Proposal:

  • L0: Current (exists today)
  • L1: Workflow engine
  • L2: Checkpointing and recovery
  • L3: Full orchestration

Cron and Scheduling

Use Cases

Examples:

"Every morning at 9am, check Hacker News for AI news and send me a summary on Telegram."

"Weekly dependency audit every Sunday at 6 AM"

Features

  • Output automatically delivered to configured platform
  • Job output saved to ~/.hermes/cron/output/<job-id>/<timestamp>.md
  • Test with /cron run <job_id> before scheduling

Limitations

  • Agent only sees script stdout
  • Background execution requires proper setup

Gateway and Messaging

Supported Platforms

Full List:

  • Telegram
  • Discord
  • Slack
  • WhatsApp
  • Signal
  • Email
  • SMS
  • Home Assistant
  • Matrix/Mattermost
  • DingTalk/Feishu/WeCom

Cross-Platform Continuity

"Instructions are given via Telegram in the morning, and progress is checked via Discord at night. It's seamless."

Voice Support

  • Voice memo transcription on all platforms
  • TTS output with /voice command
  • Discord voice channel support

Terminal Backends

Options

  1. Local (default)
  2. Docker (sandboxed)
  3. SSH (remote server)
  4. Daytona (serverless persistence)
  5. Singularity
  6. Modal (serverless, hibernates when idle)

Security

  • Container hardening with read-only root
  • Dropped capabilities
  • Namespace isolation
  • Dangerous command approval system

Browser and Vision

Browser Tools

Set:

  • browser_navigate
  • browser_click
  • browser_snapshot
  • browser_type
  • etc. (11 tools total)

Cost Impact:

  • Browser tools add ~1,258 tokens to every request (even when unused in messaging)
  • Screenshots + vision analysis are high-token operations

Vision Analysis

Supported:

  • Image URLs via vision_analyze
  • Image paste in CLI (with xclip/x11 forwarding)
  • Images via messaging platforms

Voice Mode

Features

  • STT: faster-whisper (local, free)
  • TTS: Microsoft Edge TTS (free)
  • Recording: Ctrl+B in CLI
  • Cross-platform: Works in Telegram, Discord, etc.

Comparison: Hermes vs OpenClaw

Hermes Advantages

Aspect Winner Reason
Personal companion Hermes Continuous learning, personalization
Repetitive task automation Hermes Skill learning adapts to workflows
Voice interaction Hermes Native voice support
Lightweight deployment Hermes 20MB vs 200MB+
Signal support Hermes Better multi-platform
Local model support Hermes Works better with Ollama/llama.cpp

OpenClaw Advantages

Aspect Winner Reason
Multi-agent coordination OpenClaw Better fleet management
Browser automation OpenClaw More mature plugin ecosystem
Community/plugins OpenClaw 307k stars vs 6k
MCP ecosystem OpenClaw More mature

Community Recommendation

"Use both. OpenClaw as the 'fleet commander' for multi-agent coordination, Hermes as your 'personal advisor' for one-on-one tasks."


User Experience Feedback

Positive

"Hermes optimizes for depth of learning. It is smaller, more opinionated, and built by a team that trains the underlying models."

"For repetitive workflows where agent improvement creates measurable value over time, Hermes is the stronger choice."

"It just works — installation to first conversation is minutes, not hours."

Areas for Improvement

  1. Token overhead transparency - Users surprised by costs
  2. Memory system education - Users expect automatic memory
  3. Local model guidance - Need better model recommendations
  4. Gateway debugging - Error messages can be cryptic
  5. Migration experience - OpenClaw migration has rough edges

Summary

Strengths:

  • Self-improving skill system
  • Excellent multi-platform support
  • Strong memory architecture
  • Good local model support
  • Active development

Weaknesses:

  • Token overhead can surprise users
  • Some migration/tooling rough edges
  • Documentation gaps for advanced features
  • Memory system requires user education