51123212c4
Harnesses under analysis: - opencode (Go-based coding agent) - pi (minimal terminal coding harness by Mario Zechner) - hermes (Nous Research agent) - forgecode (AI pair programmer with sub-agents) Each harness folder contains: - repo/: Source code from respective repositories - feedback/localllm/: Community feedback for local/smaller models - feedback/frontier/: Community feedback for frontier models Research focus: Tool handling, skills systems, prompt engineering, context management, and best practices for smaller/local models.
139 lines
3.3 KiB
Markdown
139 lines
3.3 KiB
Markdown
# Budget Providers Feedback (Kimi, DeepSeek, MiniMax)
|
|
|
|
**Source reference:** Community guides, official integration docs, API documentation
|
|
|
|
---
|
|
|
|
## Kimi / Moonshot AI (K2.5)
|
|
|
|
**Recommendation:** Primary budget-friendly option
|
|
|
|
### Why Kimi K2.5?
|
|
|
|
**Source:** https://hermes-agent.ai/blog/hermes-agent-api-keys
|
|
|
|
> "For most users: Kimi K2.5 from Moonshot or MiniMax as a daily driver — both are fast, capable, and inexpensive. Use Claude Sonnet or GPT-4 only for complex reasoning tasks where the extra capability is worth the significantly higher per-token cost."
|
|
|
|
### Caching Benefits
|
|
|
|
| Provider | Cache Discount |
|
|
|----------|----------------|
|
|
| Kimi K2.5 | 75% off on cache hits |
|
|
| DeepSeek | 90% off on cache hits |
|
|
| Claude/Anthropic | Full price (no special discount) |
|
|
|
|
### Cost Comparison
|
|
|
|
**Feature implementation scenario (~100 API calls):**
|
|
- Claude Sonnet 4.5: ~$34
|
|
- Kimi K2.5: ~$3-8 (depending on caching)
|
|
- DeepSeek (cache hits): Under $1
|
|
|
|
---
|
|
|
|
## DeepSeek
|
|
|
|
**Best for:** Maximum cost savings with caching
|
|
|
|
### Caching Advantage
|
|
|
|
**Source:** https://hermes-agent.ai/blog/hermes-agent-token-overhead
|
|
|
|
> "DeepSeek (90% off on cache) — Biggest cost lever"
|
|
|
|
### Use Cases
|
|
- Routine file organization
|
|
- Simple message responses
|
|
- Cron job executions
|
|
- Research lookups
|
|
|
|
---
|
|
|
|
## MiniMax
|
|
|
|
**Integration:** Official partnership/support
|
|
|
|
**Source:** https://platform.minimax.io/docs/token-plan/hermes-agent
|
|
|
|
> "Use MiniMax-M2.7 in Hermes Agent for autonomous AI-powered development."
|
|
|
|
### Token Plan
|
|
- Different from pay-as-you-go API keys
|
|
- Subscribe to Token Plan first
|
|
- Create Token Plan API Key from the Token Plan page
|
|
|
|
---
|
|
|
|
## Other Budget Options
|
|
|
|
### Z.AI / ZhipuAI (GLM Models)
|
|
- Good for Chinese language tasks
|
|
- Competitive pricing
|
|
- OpenAI-compatible endpoint
|
|
|
|
### Alibaba Cloud DashScope
|
|
- Qwen model access
|
|
- Regional availability advantages
|
|
|
|
### OpenCode Zen / Go
|
|
- Curated model access
|
|
- Budget-friendly options
|
|
|
|
---
|
|
|
|
## Provider Selection Strategy
|
|
|
|
### Tier 1: Daily Driver (High Volume, Lower Cost)
|
|
- **Kimi K2.5** - 75% cache discount, good capabilities
|
|
- **DeepSeek** - 90% cache discount, cheapest option
|
|
- **MiniMax** - Fast, capable, inexpensive
|
|
|
|
### Tier 2: Complex Tasks (Selective Use)
|
|
- **Claude Sonnet** - Best reasoning, highest cost
|
|
- **GPT-4** - Good for specific use cases
|
|
|
|
### Tier 3: Auxiliary Tasks
|
|
- **Gemini Flash** - Vision tasks, cheap
|
|
- **Local models** - Free but require hardware
|
|
|
|
---
|
|
|
|
## Configuration Example
|
|
|
|
```yaml
|
|
# config.yaml for cost optimization
|
|
model:
|
|
default: "moonshot/kimi-k2.5" # Daily driver
|
|
|
|
auxiliary:
|
|
vision:
|
|
provider: "openrouter"
|
|
model: "google/gemini-2.5-flash" # Cheap vision
|
|
```
|
|
|
|
---
|
|
|
|
## Community Experience
|
|
|
|
**Positive feedback on budget providers:**
|
|
- "Fast, capable, and inexpensive"
|
|
- Significant cost savings vs frontier models
|
|
- Good enough for 80% of tasks
|
|
|
|
**Trade-offs:**
|
|
- May struggle with complex multi-step reasoning
|
|
- Tool calling slightly less reliable than Claude
|
|
- Context understanding not as nuanced
|
|
|
|
---
|
|
|
|
## Cost Optimization Summary
|
|
|
|
| Strategy | Savings |
|
|
|----------|---------|
|
|
| Use Kimi/DeepSeek for routine tasks | 50-90% |
|
|
| Enable provider caching | 75-90% |
|
|
| Reserve Claude/GPT for complex tasks | Variable |
|
|
| Use cheaper vision models | 50-70% |
|
|
| Short sessions (`--fresh`) | Reduces context buildup |
|