# Development Patterns Analysis

## Circular Development Issues Identified

### 1. Tool Execution Architecture (15+ commits going in circles)

**The Cycle:**
```
Add server-side tool execution → Fix looping issues → Remove/simplify instructions 
→ Tools don't work → Add tool host → Return tool_calls to client (reversal) 
→ Execute server-side again (reversal back) → Fix parsing → Simplify format 
→ Enhance instructions → Add streaming support → Fix streaming format...
```

**Commits showing the cycle:**
- `00cd483` - Add server-side tool execution  
- `df4587e` - Fix: prevent looping (checking for server-side results)
- `c70f83a` - Fix: simplify looping prevention  
- `1b181bf` - Fix: remove tool instructions (40k → 0 tokens)
- `bad8732` - Fix: simplify to ~300 tokens
- `12eaac0` - Add distributed tool host
- `b7fc184` - **REVERSAL:** Return tool_calls to opencode (not server-side)
- `f83e6fc` - **REVERSAL BACK:** Execute via tool executor
- `aa137b6` - Fix: handle tool_calls as single object or array
- `539ca21` - Simplify format to TOOL:/ARGUMENTS: pattern
- `aabd2b2` - Enhance instructions for multi-step operations

**Root Cause:** No clear architectural decision on:
- Who executes tools? (Server vs Client)
- What format? (JSON vs text patterns vs markdown)
- When to add instructions? (Always vs first request vs never)

### 2. Tool Instruction Token Count (4 changes)

```
40,000 tokens → 300 tokens → removed → enhanced (unknown count)
```

**Problem:** No testing to validate if instructions actually work.

### 3. Tool Parsing (8+ fixes)

Multiple commits fixing the same parsing issues:
- `c5b8196` - Parse nested JSON in arguments
- `76b12b3` - Parse JavaScript-style output  
- `9d838c1` - Handle markdown code blocks
- `e3701cf` - Extract content before tool_calls block
- `aa137b6` - Handle single object or array
- `539ca21` - Simplify to TOOL:/ARGUMENTS: pattern

**Problem:** No unit tests for parsing. Each fix only handles one case.

### 4. Streaming + Tools (4 commits)

```
Disable streaming when tools present → Add to streaming path → Fix SSE format
```

**Problem:** Two completely different code paths that diverge and need separate fixes.

### 5. Debugging Commits (6 commits)

Commits that only add debug logging:
- `e0c500e` - "very visible request/response logging"
- `25b675c` - "explicit logging for tool executor configuration"
- `27e1971` - "response logging to both paths"
- `e3eb52d` - "log message state"
- `13e6fb2` - "add logging to tool call parsing"
- `3039629` - "log request.tools"

**Problem:** Debugging in production instead of having tests.

## Why This Happens

### 1. No Tests
- **Impact:** Every change requires manual testing
- **Result:** Fixes break other cases, regressions common
- **Evidence:** 25+ commits fixing tool-related issues

### 2. Production Debugging
- **Pattern:** Add debug logging → Fix → Remove debug logging
- **Commits:** `e0c500e`, `3728eb7` (add then clean up)
- **Better:** Unit tests with mocked LLM responses

### 3. Architectural Ambiguity
- **Question:** Who owns tool execution?
- **Server-side:** Better for simple providers
- **Client-side:** Better for complex opencode integration
- **Actual:** Switched back and forth 3+ times

### 4. Feature Interaction Complexity
- Tools + Streaming = Two paths to maintain
- Tools + Federation = Distributed execution complexity  
- Tools + Different formats = Parsing nightmare

### 5. Unclear Requirements
- Should instructions be in system prompt or user prompt?
- How many tokens is acceptable?
- What format should tools return?

## Recommendations to Prevent This

### Immediate (Prevents Next Cycle)

1. **Pick One Architecture**
   - Decision: Server-side execution via tool executor
   - Document why in ARCHITECTURE.md

2. **Token Budget**
   - Max 2000 tokens for tool instructions
   - Test with actual 16K context models
   - Never exceed 50% of context window

3. **One Format Only**
   - Standardize on: `TOOL: name\nARGUMENTS: {"key": "value"}`
   - Remove all other parsing code
   - Single regex pattern

4. **Add Unit Tests**
    ```python
    # test_tool_parsing.py
    def test_parse_simple_tool():
        text = "TOOL: read\nARGUMENTS: {\"filePath\": \"test.txt\"}"
        content, tools = parse_tool_calls(text)
        assert len(tools) == 1
        assert tools[0]["function"]["name"] == "read"
    
    def test_parse_no_tool():
        text = "Just a regular response"
        content, tools = parse_tool_calls(text)
        assert len(tools) == 0
        assert content == text
    
    def test_parse_multiple_tools():
        text = "TOOL: read\nARGUMENTS: {...}\n\nTOOL: write\nARGUMENTS: {...}"
        content, tools = parse_tool_calls(text)
        assert len(tools) == 2
    ```

5. **Integration Test Script**
    ```bash
    # test_tools.sh
    python main.py --auto --test-tools
    # Tests: read file → write file → bash command
    # Exits with error code if any fail
    ```

6. **Simplify Tool Instructions**
    - Current: ~300 tokens with 5 examples
    - Target: ~100 tokens with 2 examples
    - Include: read, write only (bash is obvious)

### Medium-term

7. **Separate Concerns**
   ```
   src/tools/
   ├── parser.py      # Only parsing logic
   ├── executor.py    # Only execution logic  
   ├── formatter.py   # Only formatting instructions
   └── integration.py # Only API integration
   ```

8. **Design Doc Before Code**
   - For tool system changes, write 1-page design first
   - Include: format, token count, examples, test plan
   - Get it right on paper before coding

9. **Feature Flags**
   ```python
   # config.py
   USE_SERVER_SIDE_TOOLS = True  # Can toggle without code changes
   TOOL_INSTRUCTION_VERSION = "v2"  # A/B test formats
   ```

### Long-term

10. **CI/CD Pipeline**
    - Run tests on every PR
    - Block merge if tests fail
    - Include: unit tests, integration tests, token count check

11. **Observability**
    - Structured logging (not print statements)
    - Metrics: tool success rate, parsing errors, latency
    - Dashboard to see issues before users report them

## Current State Assessment

**Good:**
- Tool executor abstraction exists
- Distributed tool execution works
- Working directory handling improved
- Timeout handling for package managers

**Needs Work:**
- Too many parsing code paths (simplify to one)
- Instructions too long (reduce to <2000 tokens)
- No automated testing
- Debug logging still in production code

## Suggested Immediate Actions

1. Merge current cleanup branch (already done ✓)
2. Remove all but one parsing format (done ✓)
3. Reduce tool instructions to <2000 tokens (done ✓)
4. Add unit tests for tool parsing (done ✓)
5. Add integration test for tool execution

## Success Metrics

- Tool-related commits stabilize to <2 per month
- Zero "fix: prevent looping" commits
- All tool changes include tests
- Instructions stay under 2000 tokens