Files

T

sleepy 51123212c4 Initial commit: coding harness feedback analysis

Harnesses under analysis:
- opencode (Go-based coding agent)
- pi (minimal terminal coding harness by Mario Zechner)
- hermes (Nous Research agent)
- forgecode (AI pair programmer with sub-agents)

Each harness folder contains:
- repo/: Source code from respective repositories
- feedback/localllm/: Community feedback for local/smaller models
- feedback/frontier/: Community feedback for frontier models

Research focus: Tool handling, skills systems, prompt engineering,
context management, and best practices for smaller/local models.

2026-04-09 15:13:45 +02:00

4.9 KiB

Raw Blame History

Bug Reports and Issues Collection

Collection Date: 2026-04-09
Source: GitHub Issues (NousResearch/hermes-agent)

Critical Issues

Issue #4146: Sandbox Code Execution Security Bypass (CRITICAL)

Status: Open
Severity: Critical

"Critical. Any LLM prompt injection or confused deputy scenario where the agent generates sandbox code could result in arbitrary command execution as the user."

Problem: execute_code sandbox bypasses dangerous command approval via terminal tool

Impact: Security vulnerability - sandboxed code can execute arbitrary commands

Recommended Fix: Remove terminal from SANDBOX_ALLOWED_TOOLS

Issue #1071: llama-server Compatibility (CRITICAL)

Status: Reported with fix
Error: 'dict' object has no attribute 'strip'

Environment: Windows 11 + Ubuntu/WSL2, llama-server with Qwen3.5-27B

Root Cause: llama-server returns function.arguments as dict instead of JSON string

Fix:

if isinstance(args, (dict, list)):
    tc.function.arguments = json.dumps(args)

Gateway Issues

Issue #4469: Multiple Rapid Messages Only Last One Processed

Status: Open
Component: Gateway message queuing

Problem: When user sends multiple messages while agent is running, only the last message is processed

Root Cause: Two separate pending message storage locations:

GatewayRunner._pending_messages (written but never read)
adapter._pending_messages (read but never written during interrupts)

Impact: Orphaned message queue - user messages lost

Issue #6212: Telegram Context Compaction Handoff Bug

Status: Open
Component: Telegram gateway

Problem: Fresh /start or Hello? dumps raw [CONTEXT COMPACTION] handoff instead of normal greeting

Sessions Affected:

20260408_111232_42b907
20260408_113658_19c1fc

Expected: Short greeting or "Resuming prior task" message Actual: Raw compaction summary dumped to user

Issue #5446: Discord Thread User Addition

Status: Open
Problem: User not added to private Discord thread when using /thread command

Authentication Issues

Issue #5807: Hermes Doctor Reports False "Not Logged In"

Status: Open
Component: Authentication status checking

Problem: hermes doctor reports "Nous Portal auth (not logged in)" even with valid credentials

Root Cause: get_nous_auth_status() only checks legacy providers section, not credential_pool

Workaround: Use hermes auth list for accurate status

Migration Issues

Issue #5191: OpenClaw Migration Silent Failures

Status: Open
Component: Migration tool

Bug 1: Orphaned openclaw.json - migration renames directory but doesn't copy config

Bug 2: Missing Slack token migration - tokens not extracted to ~/.hermes/.env

Impact: Gateway starts in broken state with cryptic errors

Workaround:

cp ~/.openclaw.pre-migration-*/openclaw.json ~/.openclaw/openclaw.json
# Add to ~/.hermes/.env:
SLACK_BOT_TOKEN=xoxb-...
SLACK_APP_TOKEN=xapp-...

Configuration Issues

Issue #5528: Configurable Dangerous Command Patterns

Status: Feature Request
Type: Configuration enhancement

Problem: Dangerous-command approval patterns are hard-coded in tools/approval.py

Use Case: Users cannot mark installation-specific commands (e.g., systemctl restart hermes-gateway) as approval-required

Proposed Solution:

approvals:
  extra_dangerous_patterns:
    - pattern: "\\bsystemctl\\b.*\\brestart\\b.*hermes-gateway"
      description: "restart gateway service"

Performance Issues

Issue #4379: Token Overhead Analysis

Status: Documented/Under Discussion
Finding: 73% of every API call is fixed overhead (~13.9K tokens)

Breakdown:

Tool definitions: 8,759 tokens
System prompt: 5,176 tokens
Skills catalog: ~2,200 tokens (eagerly loaded)

Recommended Optimizations:

Platform-aware tool filtering (messaging platforms don't need browser tools)
Lazy skills loading (remove from system prompt)
Compression tuning documentation

Memory Issues

Issue #509: Cognitive Memory Operations

Status: Feature Request
Proposal: Add LLM-driven encoding, consolidation, adaptive recall & extraction

Goal: Self-maintaining knowledge base that compounds over time

Issue #3943: MemoryProvider Interface

Status: Feature Request
Proposal: Interface for long-term memory integrations

Summary Table

Issue	Severity	Status	Component
#4146	Critical	Open	Security
#1071	Critical	Fix Ready	Local Models
#4469	High	Open	Gateway
#6212	Medium	Open	Telegram
#5807	Medium	Open	Auth
#5191	Medium	Open	Migration
#4379	Medium	Documented	Performance
#5528	Low	Feature Req	Config
#509	Low	Feature Req	Memory
#3943	Low	Feature Req	Memory

4.9 KiB Raw Blame History