Files
mid_model_research/hermes/feedback/frontier/budget-providers-feedback.md
T
sleepy 51123212c4 Initial commit: coding harness feedback analysis
Harnesses under analysis:
- opencode (Go-based coding agent)
- pi (minimal terminal coding harness by Mario Zechner)
- hermes (Nous Research agent)
- forgecode (AI pair programmer with sub-agents)

Each harness folder contains:
- repo/: Source code from respective repositories
- feedback/localllm/: Community feedback for local/smaller models
- feedback/frontier/: Community feedback for frontier models

Research focus: Tool handling, skills systems, prompt engineering,
context management, and best practices for smaller/local models.
2026-04-09 15:13:45 +02:00

139 lines
3.3 KiB
Markdown

# Budget Providers Feedback (Kimi, DeepSeek, MiniMax)
**Source reference:** Community guides, official integration docs, API documentation
---
## Kimi / Moonshot AI (K2.5)
**Recommendation:** Primary budget-friendly option
### Why Kimi K2.5?
**Source:** https://hermes-agent.ai/blog/hermes-agent-api-keys
> "For most users: Kimi K2.5 from Moonshot or MiniMax as a daily driver — both are fast, capable, and inexpensive. Use Claude Sonnet or GPT-4 only for complex reasoning tasks where the extra capability is worth the significantly higher per-token cost."
### Caching Benefits
| Provider | Cache Discount |
|----------|----------------|
| Kimi K2.5 | 75% off on cache hits |
| DeepSeek | 90% off on cache hits |
| Claude/Anthropic | Full price (no special discount) |
### Cost Comparison
**Feature implementation scenario (~100 API calls):**
- Claude Sonnet 4.5: ~$34
- Kimi K2.5: ~$3-8 (depending on caching)
- DeepSeek (cache hits): Under $1
---
## DeepSeek
**Best for:** Maximum cost savings with caching
### Caching Advantage
**Source:** https://hermes-agent.ai/blog/hermes-agent-token-overhead
> "DeepSeek (90% off on cache) — Biggest cost lever"
### Use Cases
- Routine file organization
- Simple message responses
- Cron job executions
- Research lookups
---
## MiniMax
**Integration:** Official partnership/support
**Source:** https://platform.minimax.io/docs/token-plan/hermes-agent
> "Use MiniMax-M2.7 in Hermes Agent for autonomous AI-powered development."
### Token Plan
- Different from pay-as-you-go API keys
- Subscribe to Token Plan first
- Create Token Plan API Key from the Token Plan page
---
## Other Budget Options
### Z.AI / ZhipuAI (GLM Models)
- Good for Chinese language tasks
- Competitive pricing
- OpenAI-compatible endpoint
### Alibaba Cloud DashScope
- Qwen model access
- Regional availability advantages
### OpenCode Zen / Go
- Curated model access
- Budget-friendly options
---
## Provider Selection Strategy
### Tier 1: Daily Driver (High Volume, Lower Cost)
- **Kimi K2.5** - 75% cache discount, good capabilities
- **DeepSeek** - 90% cache discount, cheapest option
- **MiniMax** - Fast, capable, inexpensive
### Tier 2: Complex Tasks (Selective Use)
- **Claude Sonnet** - Best reasoning, highest cost
- **GPT-4** - Good for specific use cases
### Tier 3: Auxiliary Tasks
- **Gemini Flash** - Vision tasks, cheap
- **Local models** - Free but require hardware
---
## Configuration Example
```yaml
# config.yaml for cost optimization
model:
default: "moonshot/kimi-k2.5" # Daily driver
auxiliary:
vision:
provider: "openrouter"
model: "google/gemini-2.5-flash" # Cheap vision
```
---
## Community Experience
**Positive feedback on budget providers:**
- "Fast, capable, and inexpensive"
- Significant cost savings vs frontier models
- Good enough for 80% of tasks
**Trade-offs:**
- May struggle with complex multi-step reasoning
- Tool calling slightly less reliable than Claude
- Context understanding not as nuanced
---
## Cost Optimization Summary
| Strategy | Savings |
|----------|---------|
| Use Kimi/DeepSeek for routine tasks | 50-90% |
| Enable provider caching | 75-90% |
| Reserve Claude/GPT for complex tasks | Variable |
| Use cheaper vision models | 50-70% |
| Short sessions (`--fresh`) | Reduces context buildup |