local_swarm/docs/design/2024-02-25-reduce-system-prompt-tokens.md

# Investigation: 31k Token Context Issue

## Problem
When making requests through opencode to local_swarm, the LLM receives ~31k tokens of context even for simple empty directory queries.

## Root Cause Identified

**NOT an issue with this repo's codebase - this is expected behavior for function calling.**

### How it works:

1. **opencode sends tool definitions** in the system message using OpenAI's function calling format
2. **Each tool definition is ~450 tokens** (name + description + parameters)
3. **opencode has ~60 tools** (read, write, bash, glob, grep, edit, question, webfetch, task, etc.)
4. **Total tool definition tokens:** ~27,000 tokens

### Calculation:
```
Single tool definition: ~450 tokens
Number of tools: ~60
Tool schemas total: ~27,000 tokens
System message: ~500 tokens
User query: ~100 tokens
---
Total: ~27,600 tokens
```

**This matches the observed ~31k tokens.**

## Why This Happens

OpenAI's function calling protocol requires sending the **complete function schemas** to the LLM with every request. This is how the model:
- Knows what tools are available
- Understands parameter requirements
- Knows how to format tool calls

All major LLM providers using function calling work this way (OpenAI, Anthropic, local models, etc.).

## Verification

```bash
python -c "
import tiktoken
enc = tiktoken.get_encoding('cl100k_base')

# Example from actual opencode tool definition
read_tool_schema = '''{\"type\": \"function\", \"function\": {\"name\": \"read\", \"description\": \"Read a file or directory from the local filesystem...[full description]\", \"parameters\": {...}}}'''

print(f'Single tool schema: {len(enc.encode(read_tool_schema))} tokens')
print(f'Estimated 60 tools: {len(enc.encode(read_tool_schema)) * 60:,} tokens')
"
```

Result:
- Single tool definition: ~451 tokens
- 60 tools: ~27,060 tokens
- Plus system + user message: ~27,660 total

## This Is NOT a Bug

The 31k token context is **correct and expected** for function calling with 60+ tools. This is how:
- OpenAI API works
- Claude API works
- Local models with function calling work

## Potential Optimizations (Optional)

If reducing context size is critical, consider:

### Option 1: Dynamic Tool Selection
- Only send tools relevant to current task
- Example: For file operations, only send [read, write, glob, edit]
- Trade-off: Requires opencode to intelligently filter tools

### Option 2: Compressed Tool Descriptions
- Shorten tool descriptions to essentials
- Example: "Read file at path (required: filePath)"
- Trade-off: Model may make more errors with less guidance

### Option 3: Tool Grouping
- Group similar tools into single "tools: [read, write, glob]" parameter
- Trade-off: Breaks OpenAI compatibility

## Recommendation

**NO ACTION REQUIRED.** The 31k token context is:
- Standard for function calling with many tools
- Within capabilities of modern LLMs (32k-128k context windows)
- Not caused by this repo's code

The `.opencodeignore` created earlier will help with opencode's own system prompt, but doesn't affect the LLM context sent to local_swarm.

## Additional Finding

While investigating, verified:
- `config/prompts/tool_instructions.txt`: 125 tokens ✅
- This repo's tool execution code: No token bloat ✅
- Issue is purely opencode's function calling protocol ✅