Harnesses under analysis: - opencode (Go-based coding agent) - pi (minimal terminal coding harness by Mario Zechner) - hermes (Nous Research agent) - forgecode (AI pair programmer with sub-agents) Each harness folder contains: - repo/: Source code from respective repositories - feedback/localllm/: Community feedback for local/smaller models - feedback/frontier/: Community feedback for frontier models Research focus: Tool handling, skills systems, prompt engineering, context management, and best practices for smaller/local models.
1.4 KiB
Gemma Models Feedback for Hermes Agent
Source reference: Reddit r/LocalLLaMA, HuggingFace blog, community discussions
Gemma 4 Support
Status: Day-0 ecosystem support confirmed
"We worked on making sure the new models work locally with agents like openclaw, hermes, pi, and open code. All thanks to llama.cpp!"
Source: https://huggingface.co/blog/gemma4
Gemma 4 vs Qwen 3.5 Comparison
Source: https://www.reddit.com/r/LocalLLaMA/comments/1scbpmo/so_qwen35_or_gemma_4/
Tool Use Issues
"Gemma keeps duplicating tool calls for some reason."
"Gemma is pretty fun to talk to, reminds me of the early model whimsy."
"Fixes for llama.cpp are happening in real-time so things may not be fair but so far Gemma is failing to complete the complex challenge which qwen can succeed at (24gb VRAM) it's just giving up and claiming it's succeeded when it hasn't."
Performance Notes
- Gemma 4 26B A4B Q8_0 on M2 Ultra achieves ~300 t/s (with speculative decoding caveats)
- llama.cpp support actively being fixed in real-time
- Better for conversational use than complex agentic tasks
Recommendation
For Hermes Agent specifically, community feedback suggests Qwen 3.5 currently outperforms Gemma 4 for:
- Tool use with novel tools
- Complex multi-step tasks
- Agent reliability
Gemma 4 may be preferable for:
- Conversational interactions
- Creative writing tasks
- When llama.cpp optimizations mature