# Test Plan: Improved Tool Instructions

## Problem Statement
Model is not using tools effectively:
1. Creates README instead of actual project structure
2. Provides commands as text instead of executing them
3. Refuses to run commands claiming "I am only an AI assistant"

## Root Cause Analysis
Current instructions don't clearly communicate:
- That the model SHOULD use tools proactively
- That execution is expected, not explanation
- The workflow: user request → tool execution → result

## Unit Tests (Instruction Verification)

### Test 1: Instruction Presence
- [ ] Verify instructions are injected into system message
- [ ] Verify instructions appear at the START of system message (priority position)

### Test 2: Token Count
- [ ] Measure total token count of new instructions
- [ ] Verify ≤ 500 tokens (conservative budget)
- [ ] Document before/after

### Test 3: Format Compliance
- [ ] Verify instructions include TOOL:/ARGUMENTS: format
- [ ] Verify examples use correct format
- [ ] Verify rules are clear and numbered

## Integration Tests (Behavioral)

### Test 4: Project Creation Flow
**Input:** "Create a React Hello World app"

**Expected Behavior:**
1. Model responds with TOOL: bash, ARGUMENTS: mkdir myapp
2. After result, TOOL: write, ARGUMENTS: package.json content
3. After result, TOOL: write, ARGUMENTS: src/App.js content
4. Continue until complete project structure exists

**Failure Modes:**
- [ ] Model only describes what to do
- [ ] Model creates README only
- [ ] Model refuses to execute commands

### Test 5: Multi-step Task
**Input:** "Check what files exist, then create a test.txt file with 'hello' in it"

**Expected Behavior:**
1. TOOL: bash, ARGUMENTS: ls -la
2. Wait for result
3. TOOL: write, ARGUMENTS: test.txt with "hello"

**Failure Modes:**
- [ ] Model tries to do both in one response
- [ ] Model doesn't wait for ls result before writing

### Test 6: Command Refusal
**Input:** "Run npm install"

**Expected Behavior:**
1. TOOL: bash, ARGUMENTS: npm install

**Failure Modes:**
- [ ] Model responds: "I cannot run commands, I am only an AI assistant"
- [ ] Model explains npm install instead of running it

## Manual Verification Commands

```bash
# Start the server
python main.py --auto

# In another terminal, test with curl
curl -X POST http://localhost:17615/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local-swarm",
    "messages": [{"role": "user", "content": "Create a React Hello World app"}],
    "tools": [{"type": "function", "function": {"name": "bash", "description": "Run shell commands"}}, {"type": "function", "function": {"name": "write", "description": "Write files"}}]
  }'
```

## Success Criteria

1. **Proactivity:** Model uses tools without being asked twice
2. **Execution:** Model runs commands, doesn't just describe them
3. **No Refusal:** Model never says "I cannot" or "I am only an AI"
4. **Completeness:** Multi-file projects are fully created via tools
5. **Format:** 100% of tool calls use correct TOOL:/ARGUMENTS: format

## Metrics

- **Tool usage rate:** % of requests that result in tool calls
- **Format compliance:** % of tool calls in correct format
- **Completion rate:** % of multi-step tasks fully completed