mid_model_research/hermes/feedback/frontier/budget-providers-feedback.md

# Budget Providers Feedback (Kimi, DeepSeek, MiniMax)

**Source reference:** Community guides, official integration docs, API documentation

---

## Kimi / Moonshot AI (K2.5)

**Recommendation:** Primary budget-friendly option

### Why Kimi K2.5?

**Source:** https://hermes-agent.ai/blog/hermes-agent-api-keys

> "For most users: Kimi K2.5 from Moonshot or MiniMax as a daily driver — both are fast, capable, and inexpensive. Use Claude Sonnet or GPT-4 only for complex reasoning tasks where the extra capability is worth the significantly higher per-token cost."

### Caching Benefits

| Provider | Cache Discount |
|----------|----------------|
| Kimi K2.5 | 75% off on cache hits |
| DeepSeek | 90% off on cache hits |
| Claude/Anthropic | Full price (no special discount) |

### Cost Comparison

**Feature implementation scenario (~100 API calls):**
- Claude Sonnet 4.5: ~$34
- Kimi K2.5: ~$3-8 (depending on caching)
- DeepSeek (cache hits): Under $1

---

## DeepSeek

**Best for:** Maximum cost savings with caching

### Caching Advantage

**Source:** https://hermes-agent.ai/blog/hermes-agent-token-overhead

> "DeepSeek (90% off on cache) — Biggest cost lever"

### Use Cases
- Routine file organization
- Simple message responses
- Cron job executions
- Research lookups

---

## MiniMax

**Integration:** Official partnership/support

**Source:** https://platform.minimax.io/docs/token-plan/hermes-agent

> "Use MiniMax-M2.7 in Hermes Agent for autonomous AI-powered development."

### Token Plan
- Different from pay-as-you-go API keys
- Subscribe to Token Plan first
- Create Token Plan API Key from the Token Plan page

---

## Other Budget Options

### Z.AI / ZhipuAI (GLM Models)
- Good for Chinese language tasks
- Competitive pricing
- OpenAI-compatible endpoint

### Alibaba Cloud DashScope
- Qwen model access
- Regional availability advantages

### OpenCode Zen / Go
- Curated model access
- Budget-friendly options

---

## Provider Selection Strategy

### Tier 1: Daily Driver (High Volume, Lower Cost)
- **Kimi K2.5** - 75% cache discount, good capabilities
- **DeepSeek** - 90% cache discount, cheapest option
- **MiniMax** - Fast, capable, inexpensive

### Tier 2: Complex Tasks (Selective Use)
- **Claude Sonnet** - Best reasoning, highest cost
- **GPT-4** - Good for specific use cases

### Tier 3: Auxiliary Tasks
- **Gemini Flash** - Vision tasks, cheap
- **Local models** - Free but require hardware

---

## Configuration Example

```yaml
# config.yaml for cost optimization
model:
  default: "moonshot/kimi-k2.5"  # Daily driver

auxiliary:
  vision:
    provider: "openrouter"
    model: "google/gemini-2.5-flash"  # Cheap vision
```

---

## Community Experience

**Positive feedback on budget providers:**
- "Fast, capable, and inexpensive"
- Significant cost savings vs frontier models
- Good enough for 80% of tasks

**Trade-offs:**
- May struggle with complex multi-step reasoning
- Tool calling slightly less reliable than Claude
- Context understanding not as nuanced

---

## Cost Optimization Summary

| Strategy | Savings |
|----------|---------|
| Use Kimi/DeepSeek for routine tasks | 50-90% |
| Enable provider caching | 75-90% |
| Reserve Claude/GPT for complex tasks | Variable |
| Use cheaper vision models | 50-70% |
| Short sessions (`--fresh`) | Reduces context buildup |