Files
mid_model_research/hermes/feedback/frontier/budget-providers-feedback.md
T
sleepy 51123212c4 Initial commit: coding harness feedback analysis
Harnesses under analysis:
- opencode (Go-based coding agent)
- pi (minimal terminal coding harness by Mario Zechner)
- hermes (Nous Research agent)
- forgecode (AI pair programmer with sub-agents)

Each harness folder contains:
- repo/: Source code from respective repositories
- feedback/localllm/: Community feedback for local/smaller models
- feedback/frontier/: Community feedback for frontier models

Research focus: Tool handling, skills systems, prompt engineering,
context management, and best practices for smaller/local models.
2026-04-09 15:13:45 +02:00

3.3 KiB

Budget Providers Feedback (Kimi, DeepSeek, MiniMax)

Source reference: Community guides, official integration docs, API documentation


Kimi / Moonshot AI (K2.5)

Recommendation: Primary budget-friendly option

Why Kimi K2.5?

Source: https://hermes-agent.ai/blog/hermes-agent-api-keys

"For most users: Kimi K2.5 from Moonshot or MiniMax as a daily driver — both are fast, capable, and inexpensive. Use Claude Sonnet or GPT-4 only for complex reasoning tasks where the extra capability is worth the significantly higher per-token cost."

Caching Benefits

Provider Cache Discount
Kimi K2.5 75% off on cache hits
DeepSeek 90% off on cache hits
Claude/Anthropic Full price (no special discount)

Cost Comparison

Feature implementation scenario (~100 API calls):

  • Claude Sonnet 4.5: ~$34
  • Kimi K2.5: ~$3-8 (depending on caching)
  • DeepSeek (cache hits): Under $1

DeepSeek

Best for: Maximum cost savings with caching

Caching Advantage

Source: https://hermes-agent.ai/blog/hermes-agent-token-overhead

"DeepSeek (90% off on cache) — Biggest cost lever"

Use Cases

  • Routine file organization
  • Simple message responses
  • Cron job executions
  • Research lookups

MiniMax

Integration: Official partnership/support

Source: https://platform.minimax.io/docs/token-plan/hermes-agent

"Use MiniMax-M2.7 in Hermes Agent for autonomous AI-powered development."

Token Plan

  • Different from pay-as-you-go API keys
  • Subscribe to Token Plan first
  • Create Token Plan API Key from the Token Plan page

Other Budget Options

Z.AI / ZhipuAI (GLM Models)

  • Good for Chinese language tasks
  • Competitive pricing
  • OpenAI-compatible endpoint

Alibaba Cloud DashScope

  • Qwen model access
  • Regional availability advantages

OpenCode Zen / Go

  • Curated model access
  • Budget-friendly options

Provider Selection Strategy

Tier 1: Daily Driver (High Volume, Lower Cost)

  • Kimi K2.5 - 75% cache discount, good capabilities
  • DeepSeek - 90% cache discount, cheapest option
  • MiniMax - Fast, capable, inexpensive

Tier 2: Complex Tasks (Selective Use)

  • Claude Sonnet - Best reasoning, highest cost
  • GPT-4 - Good for specific use cases

Tier 3: Auxiliary Tasks

  • Gemini Flash - Vision tasks, cheap
  • Local models - Free but require hardware

Configuration Example

# config.yaml for cost optimization
model:
  default: "moonshot/kimi-k2.5"  # Daily driver
  
auxiliary:
  vision:
    provider: "openrouter"
    model: "google/gemini-2.5-flash"  # Cheap vision

Community Experience

Positive feedback on budget providers:

  • "Fast, capable, and inexpensive"
  • Significant cost savings vs frontier models
  • Good enough for 80% of tasks

Trade-offs:

  • May struggle with complex multi-step reasoning
  • Tool calling slightly less reliable than Claude
  • Context understanding not as nuanced

Cost Optimization Summary

Strategy Savings
Use Kimi/DeepSeek for routine tasks 50-90%
Enable provider caching 75-90%
Reserve Claude/GPT for complex tasks Variable
Use cheaper vision models 50-70%
Short sessions (--fresh) Reduces context buildup