# Test Plan: Improved Tool Instructions ## Problem Statement Model is not using tools effectively: 1. Creates README instead of actual project structure 2. Provides commands as text instead of executing them 3. Refuses to run commands claiming "I am only an AI assistant" ## Root Cause Analysis Current instructions don't clearly communicate: - That the model SHOULD use tools proactively - That execution is expected, not explanation - The workflow: user request → tool execution → result ## Unit Tests (Instruction Verification) ### Test 1: Instruction Presence - [ ] Verify instructions are injected into system message - [ ] Verify instructions appear at the START of system message (priority position) ### Test 2: Token Count - [ ] Measure total token count of new instructions - [ ] Verify ≤ 500 tokens (conservative budget) - [ ] Document before/after ### Test 3: Format Compliance - [ ] Verify instructions include TOOL:/ARGUMENTS: format - [ ] Verify examples use correct format - [ ] Verify rules are clear and numbered ## Integration Tests (Behavioral) ### Test 4: Project Creation Flow **Input:** "Create a React Hello World app" **Expected Behavior:** 1. Model responds with TOOL: bash, ARGUMENTS: mkdir myapp 2. After result, TOOL: write, ARGUMENTS: package.json content 3. After result, TOOL: write, ARGUMENTS: src/App.js content 4. Continue until complete project structure exists **Failure Modes:** - [ ] Model only describes what to do - [ ] Model creates README only - [ ] Model refuses to execute commands ### Test 5: Multi-step Task **Input:** "Check what files exist, then create a test.txt file with 'hello' in it" **Expected Behavior:** 1. TOOL: bash, ARGUMENTS: ls -la 2. Wait for result 3. TOOL: write, ARGUMENTS: test.txt with "hello" **Failure Modes:** - [ ] Model tries to do both in one response - [ ] Model doesn't wait for ls result before writing ### Test 6: Command Refusal **Input:** "Run npm install" **Expected Behavior:** 1. TOOL: bash, ARGUMENTS: npm install **Failure Modes:** - [ ] Model responds: "I cannot run commands, I am only an AI assistant" - [ ] Model explains npm install instead of running it ## Manual Verification Commands ```bash # Start the server python main.py --auto # In another terminal, test with curl curl -X POST http://localhost:17615/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "local-swarm", "messages": [{"role": "user", "content": "Create a React Hello World app"}], "tools": [{"type": "function", "function": {"name": "bash", "description": "Run shell commands"}}, {"type": "function", "function": {"name": "write", "description": "Write files"}}] }' ``` ## Success Criteria 1. **Proactivity:** Model uses tools without being asked twice 2. **Execution:** Model runs commands, doesn't just describe them 3. **No Refusal:** Model never says "I cannot" or "I am only an AI" 4. **Completeness:** Multi-file projects are fully created via tools 5. **Format:** 100% of tool calls use correct TOOL:/ARGUMENTS: format ## Metrics - **Tool usage rate:** % of requests that result in tool calls - **Format compliance:** % of tool calls in correct format - **Completion rate:** % of multi-step tasks fully completed