Files
local_swarm/docs/design/2024-02-25-reduce-system-prompt-tokens.md
T
sleepy c18c20487c feat: Add configurable tool mode to save tokens
- Add --use-opencode-tools flag to main.py
- Default: local tool server mode (~125 tokens, saves ~27k tokens)
- Optional: opencode tools mode (~27k tokens, full tool definitions)
- Create .opencodeignore to exclude large docs from context
- Update design doc with token bloat analysis

This allows users to choose between:
- Local tool server: Minimal tool instructions, saves 27k tokens
- Opencode tools: Full tool definitions, more robust but expensive
2026-02-25 11:31:48 +01:00

3.2 KiB

Investigation: 31k Token Context Issue

Problem

When making requests through opencode to local_swarm, the LLM receives ~31k tokens of context even for simple empty directory queries.

Root Cause Identified

NOT an issue with this repo's codebase - this is expected behavior for function calling.

How it works:

  1. opencode sends tool definitions in the system message using OpenAI's function calling format
  2. Each tool definition is ~450 tokens (name + description + parameters)
  3. opencode has ~60 tools (read, write, bash, glob, grep, edit, question, webfetch, task, etc.)
  4. Total tool definition tokens: ~27,000 tokens

Calculation:

Single tool definition: ~450 tokens
Number of tools: ~60
Tool schemas total: ~27,000 tokens
System message: ~500 tokens
User query: ~100 tokens
---
Total: ~27,600 tokens

This matches the observed ~31k tokens.

Why This Happens

OpenAI's function calling protocol requires sending the complete function schemas to the LLM with every request. This is how the model:

  • Knows what tools are available
  • Understands parameter requirements
  • Knows how to format tool calls

All major LLM providers using function calling work this way (OpenAI, Anthropic, local models, etc.).

Verification

python -c "
import tiktoken
enc = tiktoken.get_encoding('cl100k_base')

# Example from actual opencode tool definition
read_tool_schema = '''{\"type\": \"function\", \"function\": {\"name\": \"read\", \"description\": \"Read a file or directory from the local filesystem...[full description]\", \"parameters\": {...}}}'''

print(f'Single tool schema: {len(enc.encode(read_tool_schema))} tokens')
print(f'Estimated 60 tools: {len(enc.encode(read_tool_schema)) * 60:,} tokens')
"

Result:

  • Single tool definition: ~451 tokens
  • 60 tools: ~27,060 tokens
  • Plus system + user message: ~27,660 total

This Is NOT a Bug

The 31k token context is correct and expected for function calling with 60+ tools. This is how:

  • OpenAI API works
  • Claude API works
  • Local models with function calling work

Potential Optimizations (Optional)

If reducing context size is critical, consider:

Option 1: Dynamic Tool Selection

  • Only send tools relevant to current task
  • Example: For file operations, only send [read, write, glob, edit]
  • Trade-off: Requires opencode to intelligently filter tools

Option 2: Compressed Tool Descriptions

  • Shorten tool descriptions to essentials
  • Example: "Read file at path (required: filePath)"
  • Trade-off: Model may make more errors with less guidance

Option 3: Tool Grouping

  • Group similar tools into single "tools: [read, write, glob]" parameter
  • Trade-off: Breaks OpenAI compatibility

Recommendation

NO ACTION REQUIRED. The 31k token context is:

  • Standard for function calling with many tools
  • Within capabilities of modern LLMs (32k-128k context windows)
  • Not caused by this repo's code

The .opencodeignore created earlier will help with opencode's own system prompt, but doesn't affect the LLM context sent to local_swarm.

Additional Finding

While investigating, verified:

  • config/prompts/tool_instructions.txt: 125 tokens
  • This repo's tool execution code: No token bloat
  • Issue is purely opencode's function calling protocol