Files

T

sleepy 1a1522266c Final batch of structure unification

Restructured to unified template:
- hermes/feedback/localllm/gemma-models-feedback.md
- hermes/feedback/frontier/openai-gpt-feedback.md

All key feedback files now follow FEEDBACK_TEMPLATE.md structure

2026-04-09 16:16:15 +02:00

3.1 KiB

Raw Permalink Blame History

Gemma Models Feedback for Hermes Agent

Models Covered: Gemma 4 (26B A4B)
Provider: Ollama, llama.cpp
Harness: Hermes
Date Compiled: April 9, 2026
Source References: Reddit r/LocalLLaMA, HuggingFace blog, community discussions

Quick Reference

Attribute	Value
Model	Gemma 4 26B A4B
Size	26B parameters
Quantization	Q8_0 recommended
Best For	Conversational use, creative tasks
Not Recommended For	Complex agentic tasks (per community feedback)

Gemma 4 Support Status

Status: Day-0 ecosystem support confirmed

"We worked on making sure the new models work locally with agents like openclaw, hermes, pi, and open code. All thanks to llama.cpp!"

Source: https://huggingface.co/blog/gemma4

Benchmark Results

No specific benchmark results available for Hermes + Gemma 4 combination.

What Worked Well

Ecosystem Support
- Day-0 support confirmed by HuggingFace
- Works with Hermes, OpenClaw, pi, and OpenCode
Performance on Apple Silicon
- Gemma 4 26B A4B Q8_0 on M2 Ultra achieves ~300 t/s
- Note: With speculative decoding caveats
Conversational Quality
- "Gemma is pretty fun to talk to, reminds me of the early model whimsy."
- Good for creative writing tasks

Issues Encountered

Tool Call Duplication (Major)
- Description: Gemma keeps duplicating tool calls
- Quote: "Gemma keeps duplicating tool calls for some reason."
- Source: https://www.reddit.com/r/LocalLLaMA/comments/1scbpmo/so_qwen35_or_gemma_4/
Complex Task Completion (Major)
- Description: Fails to complete complex challenges that Qwen can succeed at
- Quote: "Fixes for llama.cpp are happening in real-time so things may not be fair but so far Gemma is failing to complete the complex challenge which qwen can succeed at (24gb VRAM) it's just giving up and claiming it's succeeded when it hasn't."
- Hardware: 24GB VRAM
llama.cpp Maturity (Minor)
- Support actively being fixed in real-time
- May improve with future updates

Comparison: Gemma 4 vs Qwen 3.5

Source: https://www.reddit.com/r/LocalLLaMA/comments/1scbpmo/so_qwen35_or_gemma_4/

Aspect	Gemma 4	Qwen 3.5
Tool use with novel tools	Duplicates calls	Works well
Complex challenges	Gives up/fails	Succeeds
Conversational	Fun, whimsical	-
Agent reliability	Lower	Higher

Community Consensus: For Hermes Agent specifically, Qwen 3.5 currently outperforms Gemma 4 for tool use and complex tasks.

Recommendations

Use Gemma 4 For:

Conversational interactions
Creative writing tasks
When llama.cpp optimizations mature

Use Qwen 3.5 Instead For:

Tool use with novel tools
Complex multi-step tasks
Agent reliability

Source References

HuggingFace Blog - Gemma 4: https://huggingface.co/blog/gemma4
- Day-0 ecosystem support announcement
Reddit r/LocalLLaMA - Qwen vs Gemma: https://www.reddit.com/r/LocalLLaMA/comments/1scbpmo/so_qwen35_or_gemma_4/
- Community comparison and tool use feedback

3.1 KiB Raw Permalink Blame History