Files
local_swarm/docs/design/2024-02-25-reduce-system-prompt-tokens.md
T
sleepy c18c20487c feat: Add configurable tool mode to save tokens
- Add --use-opencode-tools flag to main.py
- Default: local tool server mode (~125 tokens, saves ~27k tokens)
- Optional: opencode tools mode (~27k tokens, full tool definitions)
- Create .opencodeignore to exclude large docs from context
- Update design doc with token bloat analysis

This allows users to choose between:
- Local tool server: Minimal tool instructions, saves 27k tokens
- Opencode tools: Full tool definitions, more robust but expensive
2026-02-25 11:31:48 +01:00

99 lines
3.2 KiB
Markdown

# Investigation: 31k Token Context Issue
## Problem
When making requests through opencode to local_swarm, the LLM receives ~31k tokens of context even for simple empty directory queries.
## Root Cause Identified
**NOT an issue with this repo's codebase - this is expected behavior for function calling.**
### How it works:
1. **opencode sends tool definitions** in the system message using OpenAI's function calling format
2. **Each tool definition is ~450 tokens** (name + description + parameters)
3. **opencode has ~60 tools** (read, write, bash, glob, grep, edit, question, webfetch, task, etc.)
4. **Total tool definition tokens:** ~27,000 tokens
### Calculation:
```
Single tool definition: ~450 tokens
Number of tools: ~60
Tool schemas total: ~27,000 tokens
System message: ~500 tokens
User query: ~100 tokens
---
Total: ~27,600 tokens
```
**This matches the observed ~31k tokens.**
## Why This Happens
OpenAI's function calling protocol requires sending the **complete function schemas** to the LLM with every request. This is how the model:
- Knows what tools are available
- Understands parameter requirements
- Knows how to format tool calls
All major LLM providers using function calling work this way (OpenAI, Anthropic, local models, etc.).
## Verification
```bash
python -c "
import tiktoken
enc = tiktoken.get_encoding('cl100k_base')
# Example from actual opencode tool definition
read_tool_schema = '''{\"type\": \"function\", \"function\": {\"name\": \"read\", \"description\": \"Read a file or directory from the local filesystem...[full description]\", \"parameters\": {...}}}'''
print(f'Single tool schema: {len(enc.encode(read_tool_schema))} tokens')
print(f'Estimated 60 tools: {len(enc.encode(read_tool_schema)) * 60:,} tokens')
"
```
Result:
- Single tool definition: ~451 tokens
- 60 tools: ~27,060 tokens
- Plus system + user message: ~27,660 total
## This Is NOT a Bug
The 31k token context is **correct and expected** for function calling with 60+ tools. This is how:
- OpenAI API works
- Claude API works
- Local models with function calling work
## Potential Optimizations (Optional)
If reducing context size is critical, consider:
### Option 1: Dynamic Tool Selection
- Only send tools relevant to current task
- Example: For file operations, only send [read, write, glob, edit]
- Trade-off: Requires opencode to intelligently filter tools
### Option 2: Compressed Tool Descriptions
- Shorten tool descriptions to essentials
- Example: "Read file at path (required: filePath)"
- Trade-off: Model may make more errors with less guidance
### Option 3: Tool Grouping
- Group similar tools into single "tools: [read, write, glob]" parameter
- Trade-off: Breaks OpenAI compatibility
## Recommendation
**NO ACTION REQUIRED.** The 31k token context is:
- Standard for function calling with many tools
- Within capabilities of modern LLMs (32k-128k context windows)
- Not caused by this repo's code
The `.opencodeignore` created earlier will help with opencode's own system prompt, but doesn't affect the LLM context sent to local_swarm.
## Additional Finding
While investigating, verified:
- `config/prompts/tool_instructions.txt`: 125 tokens ✅
- This repo's tool execution code: No token bloat ✅
- Issue is purely opencode's function calling protocol ✅