Files

T

sleepy 51123212c4 Initial commit: coding harness feedback analysis

Harnesses under analysis:
- opencode (Go-based coding agent)
- pi (minimal terminal coding harness by Mario Zechner)
- hermes (Nous Research agent)
- forgecode (AI pair programmer with sub-agents)

Each harness folder contains:
- repo/: Source code from respective repositories
- feedback/localllm/: Community feedback for local/smaller models
- feedback/frontier/: Community feedback for frontier models

Research focus: Tool handling, skills systems, prompt engineering,
context management, and best practices for smaller/local models.

2026-04-09 15:13:45 +02:00

3.3 KiB

Raw Blame History

Budget Providers Feedback (Kimi, DeepSeek, MiniMax)

Source reference: Community guides, official integration docs, API documentation

Kimi / Moonshot AI (K2.5)

Recommendation: Primary budget-friendly option

Why Kimi K2.5?

Source: https://hermes-agent.ai/blog/hermes-agent-api-keys

"For most users: Kimi K2.5 from Moonshot or MiniMax as a daily driver — both are fast, capable, and inexpensive. Use Claude Sonnet or GPT-4 only for complex reasoning tasks where the extra capability is worth the significantly higher per-token cost."

Caching Benefits

Provider	Cache Discount
Kimi K2.5	75% off on cache hits
DeepSeek	90% off on cache hits
Claude/Anthropic	Full price (no special discount)

Cost Comparison

Feature implementation scenario (~100 API calls):

Claude Sonnet 4.5: ~$34
Kimi K2.5: ~$3-8 (depending on caching)
DeepSeek (cache hits): Under $1

DeepSeek

Best for: Maximum cost savings with caching

Caching Advantage

Source: https://hermes-agent.ai/blog/hermes-agent-token-overhead

"DeepSeek (90% off on cache) — Biggest cost lever"

Use Cases

Routine file organization
Simple message responses
Cron job executions
Research lookups

MiniMax

Integration: Official partnership/support

Source: https://platform.minimax.io/docs/token-plan/hermes-agent

"Use MiniMax-M2.7 in Hermes Agent for autonomous AI-powered development."

Token Plan

Different from pay-as-you-go API keys
Subscribe to Token Plan first
Create Token Plan API Key from the Token Plan page

Other Budget Options

Z.AI / ZhipuAI (GLM Models)

Good for Chinese language tasks
Competitive pricing
OpenAI-compatible endpoint

Alibaba Cloud DashScope

Qwen model access
Regional availability advantages

OpenCode Zen / Go

Curated model access
Budget-friendly options

Provider Selection Strategy

Tier 1: Daily Driver (High Volume, Lower Cost)

Kimi K2.5 - 75% cache discount, good capabilities
DeepSeek - 90% cache discount, cheapest option
MiniMax - Fast, capable, inexpensive

Tier 2: Complex Tasks (Selective Use)

Claude Sonnet - Best reasoning, highest cost
GPT-4 - Good for specific use cases

Tier 3: Auxiliary Tasks

Gemini Flash - Vision tasks, cheap
Local models - Free but require hardware

Configuration Example

# config.yaml for cost optimization
model:
  default: "moonshot/kimi-k2.5"  # Daily driver
  
auxiliary:
  vision:
    provider: "openrouter"
    model: "google/gemini-2.5-flash"  # Cheap vision

Community Experience

Positive feedback on budget providers:

"Fast, capable, and inexpensive"
Significant cost savings vs frontier models
Good enough for 80% of tasks

Trade-offs:

May struggle with complex multi-step reasoning
Tool calling slightly less reliable than Claude
Context understanding not as nuanced

Cost Optimization Summary

Strategy	Savings
Use Kimi/DeepSeek for routine tasks	50-90%
Enable provider caching	75-90%
Reserve Claude/GPT for complex tasks	Variable
Use cheaper vision models	50-70%
Short sessions (`--fresh`)	Reduces context buildup

3.3 KiB Raw Blame History