# Development Patterns Analysis ## Circular Development Issues Identified ### 1. Tool Execution Architecture (15+ commits going in circles) **The Cycle:** ``` Add server-side tool execution → Fix looping issues → Remove/simplify instructions → Tools don't work → Add tool host → Return tool_calls to client (reversal) → Execute server-side again (reversal back) → Fix parsing → Simplify format → Enhance instructions → Add streaming support → Fix streaming format... ``` **Commits showing the cycle:** - `00cd483` - Add server-side tool execution - `df4587e` - Fix: prevent looping (checking for server-side results) - `c70f83a` - Fix: simplify looping prevention - `1b181bf` - Fix: remove tool instructions (40k → 0 tokens) - `bad8732` - Fix: simplify to ~300 tokens - `12eaac0` - Add distributed tool host - `b7fc184` - **REVERSAL:** Return tool_calls to opencode (not server-side) - `f83e6fc` - **REVERSAL BACK:** Execute via tool executor - `aa137b6` - Fix: handle tool_calls as single object or array - `539ca21` - Simplify format to TOOL:/ARGUMENTS: pattern - `aabd2b2` - Enhance instructions for multi-step operations **Root Cause:** No clear architectural decision on: - Who executes tools? (Server vs Client) - What format? (JSON vs text patterns vs markdown) - When to add instructions? (Always vs first request vs never) ### 2. Tool Instruction Token Count (4 changes) ``` 40,000 tokens → 300 tokens → removed → enhanced (unknown count) ``` **Problem:** No testing to validate if instructions actually work. ### 3. Tool Parsing (8+ fixes) Multiple commits fixing the same parsing issues: - `c5b8196` - Parse nested JSON in arguments - `76b12b3` - Parse JavaScript-style output - `9d838c1` - Handle markdown code blocks - `e3701cf` - Extract content before tool_calls block - `aa137b6` - Handle single object or array - `539ca21` - Simplify to TOOL:/ARGUMENTS: pattern **Problem:** No unit tests for parsing. Each fix only handles one case. ### 4. Streaming + Tools (4 commits) ``` Disable streaming when tools present → Add to streaming path → Fix SSE format ``` **Problem:** Two completely different code paths that diverge and need separate fixes. ### 5. Debugging Commits (6 commits) Commits that only add debug logging: - `e0c500e` - "very visible request/response logging" - `25b675c` - "explicit logging for tool executor configuration" - `27e1971` - "response logging to both paths" - `e3eb52d` - "log message state" - `13e6fb2` - "add logging to tool call parsing" - `3039629` - "log request.tools" **Problem:** Debugging in production instead of having tests. ## Why This Happens ### 1. No Tests - **Impact:** Every change requires manual testing - **Result:** Fixes break other cases, regressions common - **Evidence:** 25+ commits fixing tool-related issues ### 2. Production Debugging - **Pattern:** Add debug logging → Fix → Remove debug logging - **Commits:** `e0c500e`, `3728eb7` (add then clean up) - **Better:** Unit tests with mocked LLM responses ### 3. Architectural Ambiguity - **Question:** Who owns tool execution? - **Server-side:** Better for simple providers - **Client-side:** Better for complex opencode integration - **Actual:** Switched back and forth 3+ times ### 4. Feature Interaction Complexity - Tools + Streaming = Two paths to maintain - Tools + Federation = Distributed execution complexity - Tools + Different formats = Parsing nightmare ### 5. Unclear Requirements - Should instructions be in system prompt or user prompt? - How many tokens is acceptable? - What format should tools return? ## Recommendations to Prevent This ### Immediate (Prevents Next Cycle) 1. **Pick One Architecture** - Decision: Server-side execution via tool executor - Document why in ARCHITECTURE.md 2. **Token Budget** - Max 2000 tokens for tool instructions - Test with actual 16K context models - Never exceed 50% of context window 3. **One Format Only** - Standardize on: `TOOL: name\nARGUMENTS: {"key": "value"}` - Remove all other parsing code - Single regex pattern 4. **Add Unit Tests** ```python # test_tool_parsing.py def test_parse_simple_tool(): text = "TOOL: read\nARGUMENTS: {\"filePath\": \"test.txt\"}" content, tools = parse_tool_calls(text) assert len(tools) == 1 assert tools[0]["function"]["name"] == "read" def test_parse_no_tool(): text = "Just a regular response" content, tools = parse_tool_calls(text) assert len(tools) == 0 assert content == text def test_parse_multiple_tools(): text = "TOOL: read\nARGUMENTS: {...}\n\nTOOL: write\nARGUMENTS: {...}" content, tools = parse_tool_calls(text) assert len(tools) == 2 ``` 5. **Integration Test Script** ```bash # test_tools.sh python main.py --auto --test-tools # Tests: read file → write file → bash command # Exits with error code if any fail ``` 6. **Simplify Tool Instructions** - Current: ~300 tokens with 5 examples - Target: ~100 tokens with 2 examples - Include: read, write only (bash is obvious) ### Medium-term 7. **Separate Concerns** ``` src/tools/ ├── parser.py # Only parsing logic ├── executor.py # Only execution logic ├── formatter.py # Only formatting instructions └── integration.py # Only API integration ``` 8. **Design Doc Before Code** - For tool system changes, write 1-page design first - Include: format, token count, examples, test plan - Get it right on paper before coding 9. **Feature Flags** ```python # config.py USE_SERVER_SIDE_TOOLS = True # Can toggle without code changes TOOL_INSTRUCTION_VERSION = "v2" # A/B test formats ``` ### Long-term 10. **CI/CD Pipeline** - Run tests on every PR - Block merge if tests fail - Include: unit tests, integration tests, token count check 11. **Observability** - Structured logging (not print statements) - Metrics: tool success rate, parsing errors, latency - Dashboard to see issues before users report them ## Current State Assessment **Good:** - Tool executor abstraction exists - Distributed tool execution works - Working directory handling improved - Timeout handling for package managers **Needs Work:** - Too many parsing code paths (simplify to one) - Instructions too long (reduce to <2000 tokens) - No automated testing - Debug logging still in production code ## Suggested Immediate Actions 1. ✅ Merge current cleanup branch 2. ✅ Remove all but one parsing format 3. ✅ Reduce tool instructions to <2000 tokens 4. ✅ Add unit tests for tool parsing 5. ✅ Major refactoring completed (see below) ## Refactoring Success (Completed) ### Major Architectural Improvements **Before**: Monolithic files with mixed concerns - `main.py`: 556 lines - `routes.py`: 1,183 lines - `registry.py`: 437 lines - `selector.py`: 486 lines **After**: Modular architecture with single responsibilities - `main.py`: 99 lines (-82%) - `routes.py`: 252 lines (-79%) - `registry.py`: 194 lines (-56%) - `selector.py`: 329 lines (-32%) ### Changes Made **1. API Layer Modularization** - Extracted `formatting.py` - Message formatting logic - Extracted `tool_parser.py` - Tool parsing from various formats - Extracted `chat_handlers.py` - Chat completion business logic - `routes.py` now only handles HTTP routing (thin controllers) **2. CLI Layer Separation** - Created `cli/` package with: - `parser.py` - CLI argument parsing - `main_runner.py` - Main application orchestration - `server_runner.py` - Server lifecycle management - `test_runner.py` - Test mode execution - `tool_server.py` - Tool server management **3. Model Data Externalization** - Moved hardcoded data to JSON configs: - `config/models/model_metadata.json` - Model metadata - `config/models/mlx_quant_sizes.json` - MLX VRAM requirements - `config/models/gguf_quant_sizes.json` - GGUF VRAM requirements - `config/models/selector_config.json` - Selection constants - `registry.py` now loads from JSON instead of hardcoded dicts **4. Utility Centralization** - Created `utils/` package: - `token_counter.py` - Centralized token counting - `project_discovery.py` - Project root detection - `network.py` - Network utilities (IP detection) **5. Interactive Mode Modularization** - Created `interactive/` package: - `ui.py` - Menu display and input handling - `display.py` - Hardware and resource display - `tips.py` - Educational content - `config_utils.py` - Configuration selection **6. Swarm Orchestration** - Created `swarm/orchestrator.py` - Generation orchestration logic - Separated from `swarm/manager.py` ### Architecture Principles Established 1. **Single Responsibility**: Each module does one thing 2. **No Files > 300 Lines**: Most modules kept under limit 3. **No Functions > 50 Lines**: Large functions broken down 4. **No Nesting > 3 Levels**: Deep nesting refactored 5. **DRY Principle**: Code duplication eliminated 6. **Configuration Over Code**: Static data in JSON files ### Benefits - **Testability**: Isolated modules are easier to test - **Maintainability**: Changes affect only relevant modules - **Readability**: Smaller files are easier to understand - **Reusability**: Utilities can be used across the codebase - **Collaboration**: Multiple developers can work on different modules ## Success Metrics - ✅ Tool-related commits stabilized - ✅ Zero "fix: prevent looping" commits - ✅ All files under 300 lines (critical ones) - ✅ Instructions stay under 2000 tokens - ✅ 35 tests passing, no regressions - ✅ Clean separation of concerns