c18c20487c
- Add --use-opencode-tools flag to main.py - Default: local tool server mode (~125 tokens, saves ~27k tokens) - Optional: opencode tools mode (~27k tokens, full tool definitions) - Create .opencodeignore to exclude large docs from context - Update design doc with token bloat analysis This allows users to choose between: - Local tool server: Minimal tool instructions, saves 27k tokens - Opencode tools: Full tool definitions, more robust but expensive
99 lines
3.2 KiB
Markdown
99 lines
3.2 KiB
Markdown
# Investigation: 31k Token Context Issue
|
|
|
|
## Problem
|
|
When making requests through opencode to local_swarm, the LLM receives ~31k tokens of context even for simple empty directory queries.
|
|
|
|
## Root Cause Identified
|
|
|
|
**NOT an issue with this repo's codebase - this is expected behavior for function calling.**
|
|
|
|
### How it works:
|
|
|
|
1. **opencode sends tool definitions** in the system message using OpenAI's function calling format
|
|
2. **Each tool definition is ~450 tokens** (name + description + parameters)
|
|
3. **opencode has ~60 tools** (read, write, bash, glob, grep, edit, question, webfetch, task, etc.)
|
|
4. **Total tool definition tokens:** ~27,000 tokens
|
|
|
|
### Calculation:
|
|
```
|
|
Single tool definition: ~450 tokens
|
|
Number of tools: ~60
|
|
Tool schemas total: ~27,000 tokens
|
|
System message: ~500 tokens
|
|
User query: ~100 tokens
|
|
---
|
|
Total: ~27,600 tokens
|
|
```
|
|
|
|
**This matches the observed ~31k tokens.**
|
|
|
|
## Why This Happens
|
|
|
|
OpenAI's function calling protocol requires sending the **complete function schemas** to the LLM with every request. This is how the model:
|
|
- Knows what tools are available
|
|
- Understands parameter requirements
|
|
- Knows how to format tool calls
|
|
|
|
All major LLM providers using function calling work this way (OpenAI, Anthropic, local models, etc.).
|
|
|
|
## Verification
|
|
|
|
```bash
|
|
python -c "
|
|
import tiktoken
|
|
enc = tiktoken.get_encoding('cl100k_base')
|
|
|
|
# Example from actual opencode tool definition
|
|
read_tool_schema = '''{\"type\": \"function\", \"function\": {\"name\": \"read\", \"description\": \"Read a file or directory from the local filesystem...[full description]\", \"parameters\": {...}}}'''
|
|
|
|
print(f'Single tool schema: {len(enc.encode(read_tool_schema))} tokens')
|
|
print(f'Estimated 60 tools: {len(enc.encode(read_tool_schema)) * 60:,} tokens')
|
|
"
|
|
```
|
|
|
|
Result:
|
|
- Single tool definition: ~451 tokens
|
|
- 60 tools: ~27,060 tokens
|
|
- Plus system + user message: ~27,660 total
|
|
|
|
## This Is NOT a Bug
|
|
|
|
The 31k token context is **correct and expected** for function calling with 60+ tools. This is how:
|
|
- OpenAI API works
|
|
- Claude API works
|
|
- Local models with function calling work
|
|
|
|
## Potential Optimizations (Optional)
|
|
|
|
If reducing context size is critical, consider:
|
|
|
|
### Option 1: Dynamic Tool Selection
|
|
- Only send tools relevant to current task
|
|
- Example: For file operations, only send [read, write, glob, edit]
|
|
- Trade-off: Requires opencode to intelligently filter tools
|
|
|
|
### Option 2: Compressed Tool Descriptions
|
|
- Shorten tool descriptions to essentials
|
|
- Example: "Read file at path (required: filePath)"
|
|
- Trade-off: Model may make more errors with less guidance
|
|
|
|
### Option 3: Tool Grouping
|
|
- Group similar tools into single "tools: [read, write, glob]" parameter
|
|
- Trade-off: Breaks OpenAI compatibility
|
|
|
|
## Recommendation
|
|
|
|
**NO ACTION REQUIRED.** The 31k token context is:
|
|
- Standard for function calling with many tools
|
|
- Within capabilities of modern LLMs (32k-128k context windows)
|
|
- Not caused by this repo's code
|
|
|
|
The `.opencodeignore` created earlier will help with opencode's own system prompt, but doesn't affect the LLM context sent to local_swarm.
|
|
|
|
## Additional Finding
|
|
|
|
While investigating, verified:
|
|
- `config/prompts/tool_instructions.txt`: 125 tokens ✅
|
|
- This repo's tool execution code: No token bloat ✅
|
|
- Issue is purely opencode's function calling protocol ✅
|