# General Frontier Model Feedback **Collection Date:** 2026-04-09 **Sources:** GitHub issues, blog posts, community discussions, official documentation --- ## Provider Support Matrix | Provider | Status | Special Features | |----------|--------|------------------| | OpenAI | ✅ Full | Codex OAuth support | | Anthropic | ✅ Full | Claude Code credential store | | OpenRouter | ✅ Full | 200+ models, flexible | | Nous Portal | ✅ Full | OAuth, subscription | | Kimi/Moonshot | ✅ Full | 75% cache discount | | DeepSeek | ✅ Full | 90% cache discount | | MiniMax | ✅ Full | Token plan support | | z.ai/GLM | ✅ Full | China/global endpoints | | Gemini | ✅ Full | Via OpenRouter or direct | --- ## Key Feedback Themes ### 1. Token Overhead is the Hidden Cost **Critical Issue:** Every API call includes ~13.9K tokens of fixed overhead **Source:** GitHub Issue #4379 > "The dollar cost depends on the model/provider, but the token overhead is constant — these 13.9K tokens are sent on every call whether using Sonnet, Haiku, Llama, or any other model via OpenRouter." **Breakdown:** - Tool definitions: 8,759 tokens (46%) - System prompt: 5,176 tokens (27%) - Actual messages: ~5,000 tokens (27%) **Impact on Costs:** - A "simple weather query" can cost 21,000 tokens when the agent spawns a terminal - One user reported: "4 million tokens in 2 hours of light usage" ### 2. CLI vs Gateway Token Disparity (Fixed in v0.6.0) **Bug (pre-v0.6.0):** Telegram used 2-3x more tokens than CLI | Access Method | Tokens/Request | |---------------|----------------| | CLI | 6,000-8,000 | | Telegram (old) | 15,000-20,000 | **Root Cause:** Gateway started in repo directory instead of home directory **Fix:** Update to v0.6.0+ and restart gateway ### 3. Tool Reliability by Provider **Most Reliable:** 1. Claude Sonnet (excellent tool calling) 2. GPT-4 class models (very reliable) 3. Kimi K2.5 (good for the price) **Acceptable:** - MiniMax - DeepSeek - Gemini **Variable:** - Depends on specific task complexity - Budget models may struggle with novel tools --- ## Cost Management Strategies ### Strategy 1: Tiered Model Usage ``` Complex reasoning → Claude Sonnet / GPT-4 Routine tasks → Kimi K2.5 / MiniMax Vision tasks → Gemini Flash / GPT-4o Maximum savings → DeepSeek with cache ``` ### Strategy 2: Session Management - Use `hermes --fresh` for unrelated tasks - Run token-intensive work in CLI vs gateway - Monitor with `/usage` command ### Strategy 3: Toolset Optimization - Disable unused skill categories (~2,200 tokens saved) - Use platform-specific toolsets (~1,300 tokens saved) - Keep MEMORY.md lean --- ## Provider-Specific Notes ### OpenRouter - **Best for:** Flexibility, trying different models - **Pros:** 200+ models, single API key - **Cons:** Cache support depends on upstream ### Anthropic/Claude - **Best for:** Complex reasoning, reliability - **Pros:** Excellent tool calling, context understanding - **Cons:** Higher cost, no special cache discounts ### Nous Portal - **Best for:** Supporting the project, native integration - **Pros:** OAuth, built-in support - **Cons:** Subscription model ### Budget Providers (Kimi, DeepSeek, MiniMax) - **Best for:** High volume, routine tasks - **Pros:** 50-90% cost savings, fast - **Cons:** May struggle with complex tasks --- ## Community Quotes **On Cost:** > "Choosing a cache-friendly provider is the single biggest lever for reducing costs." **On Performance:** > "One developer reported that a task taking OpenClaw 50+ tool calls and steps took Hermes 5 correct tool calls and finished 2.5 minutes faster." **On Model Selection:** > "For most users: Kimi K2.5 from Moonshot or MiniMax as a daily driver — both are fast, capable, and inexpensive." --- ## Summary Table | Provider | Cost | Reliability | Speed | Cache | |----------|------|-------------|-------|-------| | Claude Sonnet | $$$$ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Partial | | GPT-4 | $$$$ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Partial | | Kimi K2.5 | $$ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 75% | | DeepSeek | $ | ⭐⭐⭐ | ⭐⭐⭐⭐ | 90% | | MiniMax | $$ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | No | | OpenRouter | Varies | Varies | Varies | Varies | **Legend:** - $ = Budget friendly - ⭐ = Performance rating - Cache % = Discount on cached tokens