Files

T

sleepy 51123212c4 Initial commit: coding harness feedback analysis

Harnesses under analysis:
- opencode (Go-based coding agent)
- pi (minimal terminal coding harness by Mario Zechner)
- hermes (Nous Research agent)
- forgecode (AI pair programmer with sub-agents)

Each harness folder contains:
- repo/: Source code from respective repositories
- feedback/localllm/: Community feedback for local/smaller models
- feedback/frontier/: Community feedback for frontier models

Research focus: Tool handling, skills systems, prompt engineering,
context management, and best practices for smaller/local models.

2026-04-09 15:13:45 +02:00

1.4 KiB

Raw Blame History

Gemma Models Feedback for Hermes Agent

Source reference: Reddit r/LocalLLaMA, HuggingFace blog, community discussions

Gemma 4 Support

Status: Day-0 ecosystem support confirmed

"We worked on making sure the new models work locally with agents like openclaw, hermes, pi, and open code. All thanks to llama.cpp!"

Source: https://huggingface.co/blog/gemma4

Gemma 4 vs Qwen 3.5 Comparison

Source: https://www.reddit.com/r/LocalLLaMA/comments/1scbpmo/so_qwen35_or_gemma_4/

Tool Use Issues

"Gemma keeps duplicating tool calls for some reason."

"Gemma is pretty fun to talk to, reminds me of the early model whimsy."

"Fixes for llama.cpp are happening in real-time so things may not be fair but so far Gemma is failing to complete the complex challenge which qwen can succeed at (24gb VRAM) it's just giving up and claiming it's succeeded when it hasn't."

Performance Notes

Gemma 4 26B A4B Q8_0 on M2 Ultra achieves ~300 t/s (with speculative decoding caveats)
llama.cpp support actively being fixed in real-time
Better for conversational use than complex agentic tasks

Recommendation

For Hermes Agent specifically, community feedback suggests Qwen 3.5 currently outperforms Gemma 4 for:

Tool use with novel tools
Complex multi-step tasks
Agent reliability

Gemma 4 may be preferable for:

Conversational interactions
Creative writing tasks
When llama.cpp optimizations mature

1.4 KiB Raw Blame History