Restructured to unified template: - hermes/feedback/localllm/gemma-models-feedback.md - hermes/feedback/frontier/openai-gpt-feedback.md All key feedback files now follow FEEDBACK_TEMPLATE.md structure
3.1 KiB
Gemma Models Feedback for Hermes Agent
Models Covered: Gemma 4 (26B A4B)
Provider: Ollama, llama.cpp
Harness: Hermes
Date Compiled: April 9, 2026
Source References: Reddit r/LocalLLaMA, HuggingFace blog, community discussions
Quick Reference
| Attribute | Value |
|---|---|
| Model | Gemma 4 26B A4B |
| Size | 26B parameters |
| Quantization | Q8_0 recommended |
| Best For | Conversational use, creative tasks |
| Not Recommended For | Complex agentic tasks (per community feedback) |
Gemma 4 Support Status
Status: Day-0 ecosystem support confirmed
"We worked on making sure the new models work locally with agents like openclaw, hermes, pi, and open code. All thanks to llama.cpp!"
Source: https://huggingface.co/blog/gemma4
Benchmark Results
No specific benchmark results available for Hermes + Gemma 4 combination.
What Worked Well
-
Ecosystem Support
- Day-0 support confirmed by HuggingFace
- Works with Hermes, OpenClaw, pi, and OpenCode
-
Performance on Apple Silicon
- Gemma 4 26B A4B Q8_0 on M2 Ultra achieves ~300 t/s
- Note: With speculative decoding caveats
-
Conversational Quality
- "Gemma is pretty fun to talk to, reminds me of the early model whimsy."
- Good for creative writing tasks
Issues Encountered
-
Tool Call Duplication (Major)
- Description: Gemma keeps duplicating tool calls
- Quote: "Gemma keeps duplicating tool calls for some reason."
- Source: https://www.reddit.com/r/LocalLLaMA/comments/1scbpmo/so_qwen35_or_gemma_4/
-
Complex Task Completion (Major)
- Description: Fails to complete complex challenges that Qwen can succeed at
- Quote: "Fixes for llama.cpp are happening in real-time so things may not be fair but so far Gemma is failing to complete the complex challenge which qwen can succeed at (24gb VRAM) it's just giving up and claiming it's succeeded when it hasn't."
- Hardware: 24GB VRAM
-
llama.cpp Maturity (Minor)
- Support actively being fixed in real-time
- May improve with future updates
Comparison: Gemma 4 vs Qwen 3.5
Source: https://www.reddit.com/r/LocalLLaMA/comments/1scbpmo/so_qwen35_or_gemma_4/
| Aspect | Gemma 4 | Qwen 3.5 |
|---|---|---|
| Tool use with novel tools | Duplicates calls | Works well |
| Complex challenges | Gives up/fails | Succeeds |
| Conversational | Fun, whimsical | - |
| Agent reliability | Lower | Higher |
Community Consensus: For Hermes Agent specifically, Qwen 3.5 currently outperforms Gemma 4 for tool use and complex tasks.
Recommendations
Use Gemma 4 For:
- Conversational interactions
- Creative writing tasks
- When llama.cpp optimizations mature
Use Qwen 3.5 Instead For:
- Tool use with novel tools
- Complex multi-step tasks
- Agent reliability
Source References
-
HuggingFace Blog - Gemma 4: https://huggingface.co/blog/gemma4
- Day-0 ecosystem support announcement
-
Reddit r/LocalLLaMA - Qwen vs Gemma: https://www.reddit.com/r/LocalLLaMA/comments/1scbpmo/so_qwen35_or_gemma_4/
- Community comparison and tool use feedback