Files
mid_model_research/hermes/feedback/localllm/gemma-models-feedback.md
T
sleepy 51123212c4 Initial commit: coding harness feedback analysis
Harnesses under analysis:
- opencode (Go-based coding agent)
- pi (minimal terminal coding harness by Mario Zechner)
- hermes (Nous Research agent)
- forgecode (AI pair programmer with sub-agents)

Each harness folder contains:
- repo/: Source code from respective repositories
- feedback/localllm/: Community feedback for local/smaller models
- feedback/frontier/: Community feedback for frontier models

Research focus: Tool handling, skills systems, prompt engineering,
context management, and best practices for smaller/local models.
2026-04-09 15:13:45 +02:00

1.4 KiB

Gemma Models Feedback for Hermes Agent

Source reference: Reddit r/LocalLLaMA, HuggingFace blog, community discussions


Gemma 4 Support

Status: Day-0 ecosystem support confirmed

"We worked on making sure the new models work locally with agents like openclaw, hermes, pi, and open code. All thanks to llama.cpp!"

Source: https://huggingface.co/blog/gemma4


Gemma 4 vs Qwen 3.5 Comparison

Source: https://www.reddit.com/r/LocalLLaMA/comments/1scbpmo/so_qwen35_or_gemma_4/

Tool Use Issues

"Gemma keeps duplicating tool calls for some reason."

"Gemma is pretty fun to talk to, reminds me of the early model whimsy."

"Fixes for llama.cpp are happening in real-time so things may not be fair but so far Gemma is failing to complete the complex challenge which qwen can succeed at (24gb VRAM) it's just giving up and claiming it's succeeded when it hasn't."


Performance Notes

  • Gemma 4 26B A4B Q8_0 on M2 Ultra achieves ~300 t/s (with speculative decoding caveats)
  • llama.cpp support actively being fixed in real-time
  • Better for conversational use than complex agentic tasks

Recommendation

For Hermes Agent specifically, community feedback suggests Qwen 3.5 currently outperforms Gemma 4 for:

  • Tool use with novel tools
  • Complex multi-step tasks
  • Agent reliability

Gemma 4 may be preferable for:

  • Conversational interactions
  • Creative writing tasks
  • When llama.cpp optimizations mature