# Design Decision: Improved Tool Instructions **Date:** 2024-02-24 **Scope:** src/api/routes.py tool_instructions **Lines Changed:** ~25 lines ## Problem Current tool instructions (~125 tokens) fail to communicate key behavioral expectations: 1. **Passive vs Active:** Model describes what to do instead of doing it 2. **Refusal:** Model claims "I am only an AI assistant" instead of executing 3. **Incomplete:** Multi-file projects result in README only Evidence from user report: - Request: "Create React Hello World app" - Result: README only (not actual files) - Subsequent: Commands given as text, not executed - Final: "I am only an AI assistant" refusal ## Root Cause Analysis The instructions lack: 1. **Authority statement** - "You CAN and SHOULD use tools" 2. **Execution mandate** - "Execute commands, don't just describe them" 3. **Workflow clarity** - Clear step-by-step expectations 4. **Anti-pattern examples** - What NOT to do ## Options Considered ### Option 1: Minor Tweaks Add a few lines to existing instructions. - **Pros:** Minimal token increase - **Cons:** Band-aid fix, may not solve root cause - **Verdict:** REJECTED - Doesn't address behavioral issue ### Option 2: Complete Rewrite with Strong Mandate Rewrite instructions to emphasize: - Proactive tool usage - Execution over explanation - Clear workflow - Anti-patterns to avoid - **Pros:** Addresses root cause, clear behavioral guidance - **Cons:** Higher token count (estimated 300-400 tokens) - **Verdict:** ACCEPTED - Proper fix for behavioral issue ### Option 3: Few-Shot Examples Include full conversation examples in instructions. - **Pros:** Shows exactly what to do - **Cons:** Very high token count (1000+ tokens), may confuse model - **Verdict:** REJECTED - Violates token budget ## Decision Implement Option 2: Rewrite with emphasis on proactivity and execution. **Key additions:** 1. **Capability statement:** "You have tools. Use them." 2. **Execution mandate:** "Don't describe, execute" 3. **Workflow:** Clear request→tool→result→next cycle 4. **Anti-patterns:** Explicitly forbid "I cannot" responses ## Impact ### Token Budget (Exact Count - cl100k_base) - **Current:** 478 tokens (1,810 characters) - **Status:** Within 2000 token limit ✓ - **Status:** Within 500 conservative estimate ✓ - **Context window:** 16K model leaves ~15.5K for user input ✓ - **Code comment:** Token count documented in src/api/routes.py ✓ ### Code Changes - **File:** src/api/routes.py - **Lines:** +48/-18 (net +30) - **Type:** Instructions replacement - **Token documentation:** Added inline comment with exact token count ### Breaking Changes - **None** - Instructions are additive/clearer, not different format ### Behavioral Changes - **Expected:** More proactive tool usage - **Expected:** No more "I cannot" refusals - **Expected:** Multi-step projects completed via tools - **Expected:** Commands executed, not described ### Review Blockers Addressed - ✅ Token budget documented in code (REVIEW-2026-02-24-cleanup-branch.md Blocker #1) - ✅ Exact token count calculated using tiktoken (478 tokens) - ✅ Token documentation added to src/api/routes.py (REVIEW-2026-02-24-cleanup-branch.md Blocker #2) - ⚠️ Debug print statements - existing code, noted in DEVELOPMENT_PATTERNS.md as "still needs work" (not in scope of this change) - ⚠️ Integration tests for instruction effectiveness - requires running actual LLM, covered by parsing tests ## Implementation ```python tool_instructions = """You are an AI assistant with TOOL ACCESS. You CAN and MUST use tools to accomplish tasks. **CRITICAL RULES:** 1. When asked to do something, EXECUTE it using tools - don't just describe how 2. You MUST use the bash tool to run commands like mkdir, npm, ls, etc. 3. You MUST use the write tool to create files 4. NEVER say "I cannot" or "I am only an AI assistant" - you HAVE tools, USE them 5. Complete tasks FULLY - don't stop at README, create ALL required files **AVAILABLE TOOLS:** - read: Read file content - write: Create/overwrite files - bash: Execute shell commands (npm, mkdir, ls, etc.) **TOOL FORMAT (STRICT):** TOOL: tool_name ARGUMENTS: {"param": "value"} **WORKFLOW:** 1. User asks for something 2. You decide what tool to use 3. You respond with ONLY the TOOL: format above 4. You receive the tool result 5. You continue with next tool until task is COMPLETE **EXAMPLES:** Creating a project: User: "Create a React app" You: TOOL: bash ARGUMENTS: {"command": "mkdir myapp && cd myapp && npm init -y"} [wait for result] You: TOOL: write ARGUMENTS: {"filePath": "myapp/package.json", "content": "..."} [continue until all files created] Running commands: User: "Install dependencies" You: TOOL: bash ARGUMENTS: {"command": "npm install"} [wait for result, then confirm completion] **WHAT NOT TO DO:** - ❌ "To create a React app, you should run: mkdir myapp" (describing) - ❌ "I cannot run commands, I am an AI" (refusing) - ❌ Creating only README instead of full project (incomplete) - ❌ "First do X, then do Y" (giving instructions instead of doing) **CORRECT BEHAVIOR:** - ✅ Execute the command immediately using the bash tool - ✅ Create all files using the write tool - ✅ Continue until task is 100% complete - ✅ Use ONE tool at a time and wait for results""" ``` ## Testing 1. Test with React Hello World request 2. Verify model uses bash to create directory structure 3. Verify model uses write to create all files 4. Verify no "I cannot" responses ## Rollback Plan If new instructions cause issues: 1. Revert to previous ~125 token version 2. Analyze what specifically failed 3. Iterate on smaller changes ## Success Metrics - [ ] Model uses tools on first request (not after prompting) - [ ] Zero "I cannot" or "I am an AI" responses - [ ] Multi-file projects fully created - [ ] Commands executed, not described