Harnesses under analysis: - opencode (Go-based coding agent) - pi (minimal terminal coding harness by Mario Zechner) - hermes (Nous Research agent) - forgecode (AI pair programmer with sub-agents) Each harness folder contains: - repo/: Source code from respective repositories - feedback/localllm/: Community feedback for local/smaller models - feedback/frontier/: Community feedback for frontier models Research focus: Tool handling, skills systems, prompt engineering, context management, and best practices for smaller/local models.
3.3 KiB
Budget Providers Feedback (Kimi, DeepSeek, MiniMax)
Source reference: Community guides, official integration docs, API documentation
Kimi / Moonshot AI (K2.5)
Recommendation: Primary budget-friendly option
Why Kimi K2.5?
Source: https://hermes-agent.ai/blog/hermes-agent-api-keys
"For most users: Kimi K2.5 from Moonshot or MiniMax as a daily driver — both are fast, capable, and inexpensive. Use Claude Sonnet or GPT-4 only for complex reasoning tasks where the extra capability is worth the significantly higher per-token cost."
Caching Benefits
| Provider | Cache Discount |
|---|---|
| Kimi K2.5 | 75% off on cache hits |
| DeepSeek | 90% off on cache hits |
| Claude/Anthropic | Full price (no special discount) |
Cost Comparison
Feature implementation scenario (~100 API calls):
- Claude Sonnet 4.5: ~$34
- Kimi K2.5: ~$3-8 (depending on caching)
- DeepSeek (cache hits): Under $1
DeepSeek
Best for: Maximum cost savings with caching
Caching Advantage
Source: https://hermes-agent.ai/blog/hermes-agent-token-overhead
"DeepSeek (90% off on cache) — Biggest cost lever"
Use Cases
- Routine file organization
- Simple message responses
- Cron job executions
- Research lookups
MiniMax
Integration: Official partnership/support
Source: https://platform.minimax.io/docs/token-plan/hermes-agent
"Use MiniMax-M2.7 in Hermes Agent for autonomous AI-powered development."
Token Plan
- Different from pay-as-you-go API keys
- Subscribe to Token Plan first
- Create Token Plan API Key from the Token Plan page
Other Budget Options
Z.AI / ZhipuAI (GLM Models)
- Good for Chinese language tasks
- Competitive pricing
- OpenAI-compatible endpoint
Alibaba Cloud DashScope
- Qwen model access
- Regional availability advantages
OpenCode Zen / Go
- Curated model access
- Budget-friendly options
Provider Selection Strategy
Tier 1: Daily Driver (High Volume, Lower Cost)
- Kimi K2.5 - 75% cache discount, good capabilities
- DeepSeek - 90% cache discount, cheapest option
- MiniMax - Fast, capable, inexpensive
Tier 2: Complex Tasks (Selective Use)
- Claude Sonnet - Best reasoning, highest cost
- GPT-4 - Good for specific use cases
Tier 3: Auxiliary Tasks
- Gemini Flash - Vision tasks, cheap
- Local models - Free but require hardware
Configuration Example
# config.yaml for cost optimization
model:
default: "moonshot/kimi-k2.5" # Daily driver
auxiliary:
vision:
provider: "openrouter"
model: "google/gemini-2.5-flash" # Cheap vision
Community Experience
Positive feedback on budget providers:
- "Fast, capable, and inexpensive"
- Significant cost savings vs frontier models
- Good enough for 80% of tasks
Trade-offs:
- May struggle with complex multi-step reasoning
- Tool calling slightly less reliable than Claude
- Context understanding not as nuanced
Cost Optimization Summary
| Strategy | Savings |
|---|---|
| Use Kimi/DeepSeek for routine tasks | 50-90% |
| Enable provider caching | 75-90% |
| Reserve Claude/GPT for complex tasks | Variable |
| Use cheaper vision models | 50-70% |
Short sessions (--fresh) |
Reduces context buildup |