Files
sleepy 1a1522266c Final batch of structure unification
Restructured to unified template:
- hermes/feedback/localllm/gemma-models-feedback.md
- hermes/feedback/frontier/openai-gpt-feedback.md

All key feedback files now follow FEEDBACK_TEMPLATE.md structure
2026-04-09 16:16:15 +02:00

3.1 KiB

Gemma Models Feedback for Hermes Agent

Models Covered: Gemma 4 (26B A4B)
Provider: Ollama, llama.cpp
Harness: Hermes
Date Compiled: April 9, 2026
Source References: Reddit r/LocalLLaMA, HuggingFace blog, community discussions


Quick Reference

Attribute Value
Model Gemma 4 26B A4B
Size 26B parameters
Quantization Q8_0 recommended
Best For Conversational use, creative tasks
Not Recommended For Complex agentic tasks (per community feedback)

Gemma 4 Support Status

Status: Day-0 ecosystem support confirmed

"We worked on making sure the new models work locally with agents like openclaw, hermes, pi, and open code. All thanks to llama.cpp!"

Source: https://huggingface.co/blog/gemma4


Benchmark Results

No specific benchmark results available for Hermes + Gemma 4 combination.


What Worked Well

  1. Ecosystem Support

    • Day-0 support confirmed by HuggingFace
    • Works with Hermes, OpenClaw, pi, and OpenCode
  2. Performance on Apple Silicon

    • Gemma 4 26B A4B Q8_0 on M2 Ultra achieves ~300 t/s
    • Note: With speculative decoding caveats
  3. Conversational Quality

    • "Gemma is pretty fun to talk to, reminds me of the early model whimsy."
    • Good for creative writing tasks

Issues Encountered

  1. Tool Call Duplication (Major)

  2. Complex Task Completion (Major)

    • Description: Fails to complete complex challenges that Qwen can succeed at
    • Quote: "Fixes for llama.cpp are happening in real-time so things may not be fair but so far Gemma is failing to complete the complex challenge which qwen can succeed at (24gb VRAM) it's just giving up and claiming it's succeeded when it hasn't."
    • Hardware: 24GB VRAM
  3. llama.cpp Maturity (Minor)

    • Support actively being fixed in real-time
    • May improve with future updates

Comparison: Gemma 4 vs Qwen 3.5

Source: https://www.reddit.com/r/LocalLLaMA/comments/1scbpmo/so_qwen35_or_gemma_4/

Aspect Gemma 4 Qwen 3.5
Tool use with novel tools Duplicates calls Works well
Complex challenges Gives up/fails Succeeds
Conversational Fun, whimsical -
Agent reliability Lower Higher

Community Consensus: For Hermes Agent specifically, Qwen 3.5 currently outperforms Gemma 4 for tool use and complex tasks.


Recommendations

Use Gemma 4 For:

  • Conversational interactions
  • Creative writing tasks
  • When llama.cpp optimizations mature

Use Qwen 3.5 Instead For:

  • Tool use with novel tools
  • Complex multi-step tasks
  • Agent reliability

Source References

  1. HuggingFace Blog - Gemma 4: https://huggingface.co/blog/gemma4

    • Day-0 ecosystem support announcement
  2. Reddit r/LocalLLaMA - Qwen vs Gemma: https://www.reddit.com/r/LocalLLaMA/comments/1scbpmo/so_qwen35_or_gemma_4/

    • Community comparison and tool use feedback