b9ce5db8ef
Updated documentation to reflect the recent refactoring: README.md: - Added detailed project structure with line counts - Added Architecture Principles section - Added Development section with code quality standards - Added section about recent refactoring work ARCHITECTURE.md: - Added complete project structure tree - Added Architecture Principles section - Detailed all modules and their responsibilities - Added Configuration Files section - Added Code Quality Standards section DEVELOPMENT_PATTERNS.md: - Added Refactoring Success section - Documented all changes made - Listed architecture principles established - Updated success metrics with checkmarks
292 lines
9.5 KiB
Markdown
292 lines
9.5 KiB
Markdown
# Development Patterns Analysis
|
|
|
|
## Circular Development Issues Identified
|
|
|
|
### 1. Tool Execution Architecture (15+ commits going in circles)
|
|
|
|
**The Cycle:**
|
|
```
|
|
Add server-side tool execution → Fix looping issues → Remove/simplify instructions
|
|
→ Tools don't work → Add tool host → Return tool_calls to client (reversal)
|
|
→ Execute server-side again (reversal back) → Fix parsing → Simplify format
|
|
→ Enhance instructions → Add streaming support → Fix streaming format...
|
|
```
|
|
|
|
**Commits showing the cycle:**
|
|
- `00cd483` - Add server-side tool execution
|
|
- `df4587e` - Fix: prevent looping (checking for server-side results)
|
|
- `c70f83a` - Fix: simplify looping prevention
|
|
- `1b181bf` - Fix: remove tool instructions (40k → 0 tokens)
|
|
- `bad8732` - Fix: simplify to ~300 tokens
|
|
- `12eaac0` - Add distributed tool host
|
|
- `b7fc184` - **REVERSAL:** Return tool_calls to opencode (not server-side)
|
|
- `f83e6fc` - **REVERSAL BACK:** Execute via tool executor
|
|
- `aa137b6` - Fix: handle tool_calls as single object or array
|
|
- `539ca21` - Simplify format to TOOL:/ARGUMENTS: pattern
|
|
- `aabd2b2` - Enhance instructions for multi-step operations
|
|
|
|
**Root Cause:** No clear architectural decision on:
|
|
- Who executes tools? (Server vs Client)
|
|
- What format? (JSON vs text patterns vs markdown)
|
|
- When to add instructions? (Always vs first request vs never)
|
|
|
|
### 2. Tool Instruction Token Count (4 changes)
|
|
|
|
```
|
|
40,000 tokens → 300 tokens → removed → enhanced (unknown count)
|
|
```
|
|
|
|
**Problem:** No testing to validate if instructions actually work.
|
|
|
|
### 3. Tool Parsing (8+ fixes)
|
|
|
|
Multiple commits fixing the same parsing issues:
|
|
- `c5b8196` - Parse nested JSON in arguments
|
|
- `76b12b3` - Parse JavaScript-style output
|
|
- `9d838c1` - Handle markdown code blocks
|
|
- `e3701cf` - Extract content before tool_calls block
|
|
- `aa137b6` - Handle single object or array
|
|
- `539ca21` - Simplify to TOOL:/ARGUMENTS: pattern
|
|
|
|
**Problem:** No unit tests for parsing. Each fix only handles one case.
|
|
|
|
### 4. Streaming + Tools (4 commits)
|
|
|
|
```
|
|
Disable streaming when tools present → Add to streaming path → Fix SSE format
|
|
```
|
|
|
|
**Problem:** Two completely different code paths that diverge and need separate fixes.
|
|
|
|
### 5. Debugging Commits (6 commits)
|
|
|
|
Commits that only add debug logging:
|
|
- `e0c500e` - "very visible request/response logging"
|
|
- `25b675c` - "explicit logging for tool executor configuration"
|
|
- `27e1971` - "response logging to both paths"
|
|
- `e3eb52d` - "log message state"
|
|
- `13e6fb2` - "add logging to tool call parsing"
|
|
- `3039629` - "log request.tools"
|
|
|
|
**Problem:** Debugging in production instead of having tests.
|
|
|
|
## Why This Happens
|
|
|
|
### 1. No Tests
|
|
- **Impact:** Every change requires manual testing
|
|
- **Result:** Fixes break other cases, regressions common
|
|
- **Evidence:** 25+ commits fixing tool-related issues
|
|
|
|
### 2. Production Debugging
|
|
- **Pattern:** Add debug logging → Fix → Remove debug logging
|
|
- **Commits:** `e0c500e`, `3728eb7` (add then clean up)
|
|
- **Better:** Unit tests with mocked LLM responses
|
|
|
|
### 3. Architectural Ambiguity
|
|
- **Question:** Who owns tool execution?
|
|
- **Server-side:** Better for simple providers
|
|
- **Client-side:** Better for complex opencode integration
|
|
- **Actual:** Switched back and forth 3+ times
|
|
|
|
### 4. Feature Interaction Complexity
|
|
- Tools + Streaming = Two paths to maintain
|
|
- Tools + Federation = Distributed execution complexity
|
|
- Tools + Different formats = Parsing nightmare
|
|
|
|
### 5. Unclear Requirements
|
|
- Should instructions be in system prompt or user prompt?
|
|
- How many tokens is acceptable?
|
|
- What format should tools return?
|
|
|
|
## Recommendations to Prevent This
|
|
|
|
### Immediate (Prevents Next Cycle)
|
|
|
|
1. **Pick One Architecture**
|
|
- Decision: Server-side execution via tool executor
|
|
- Document why in ARCHITECTURE.md
|
|
|
|
2. **Token Budget**
|
|
- Max 2000 tokens for tool instructions
|
|
- Test with actual 16K context models
|
|
- Never exceed 50% of context window
|
|
|
|
3. **One Format Only**
|
|
- Standardize on: `TOOL: name\nARGUMENTS: {"key": "value"}`
|
|
- Remove all other parsing code
|
|
- Single regex pattern
|
|
|
|
4. **Add Unit Tests**
|
|
```python
|
|
# test_tool_parsing.py
|
|
def test_parse_simple_tool():
|
|
text = "TOOL: read\nARGUMENTS: {\"filePath\": \"test.txt\"}"
|
|
content, tools = parse_tool_calls(text)
|
|
assert len(tools) == 1
|
|
assert tools[0]["function"]["name"] == "read"
|
|
|
|
def test_parse_no_tool():
|
|
text = "Just a regular response"
|
|
content, tools = parse_tool_calls(text)
|
|
assert len(tools) == 0
|
|
assert content == text
|
|
|
|
def test_parse_multiple_tools():
|
|
text = "TOOL: read\nARGUMENTS: {...}\n\nTOOL: write\nARGUMENTS: {...}"
|
|
content, tools = parse_tool_calls(text)
|
|
assert len(tools) == 2
|
|
```
|
|
|
|
5. **Integration Test Script**
|
|
```bash
|
|
# test_tools.sh
|
|
python main.py --auto --test-tools
|
|
# Tests: read file → write file → bash command
|
|
# Exits with error code if any fail
|
|
```
|
|
|
|
6. **Simplify Tool Instructions**
|
|
- Current: ~300 tokens with 5 examples
|
|
- Target: ~100 tokens with 2 examples
|
|
- Include: read, write only (bash is obvious)
|
|
|
|
### Medium-term
|
|
|
|
7. **Separate Concerns**
|
|
```
|
|
src/tools/
|
|
├── parser.py # Only parsing logic
|
|
├── executor.py # Only execution logic
|
|
├── formatter.py # Only formatting instructions
|
|
└── integration.py # Only API integration
|
|
```
|
|
|
|
8. **Design Doc Before Code**
|
|
- For tool system changes, write 1-page design first
|
|
- Include: format, token count, examples, test plan
|
|
- Get it right on paper before coding
|
|
|
|
9. **Feature Flags**
|
|
```python
|
|
# config.py
|
|
USE_SERVER_SIDE_TOOLS = True # Can toggle without code changes
|
|
TOOL_INSTRUCTION_VERSION = "v2" # A/B test formats
|
|
```
|
|
|
|
### Long-term
|
|
|
|
10. **CI/CD Pipeline**
|
|
- Run tests on every PR
|
|
- Block merge if tests fail
|
|
- Include: unit tests, integration tests, token count check
|
|
|
|
11. **Observability**
|
|
- Structured logging (not print statements)
|
|
- Metrics: tool success rate, parsing errors, latency
|
|
- Dashboard to see issues before users report them
|
|
|
|
## Current State Assessment
|
|
|
|
**Good:**
|
|
- Tool executor abstraction exists
|
|
- Distributed tool execution works
|
|
- Working directory handling improved
|
|
- Timeout handling for package managers
|
|
|
|
**Needs Work:**
|
|
- Too many parsing code paths (simplify to one)
|
|
- Instructions too long (reduce to <2000 tokens)
|
|
- No automated testing
|
|
- Debug logging still in production code
|
|
|
|
## Suggested Immediate Actions
|
|
|
|
1. ✅ Merge current cleanup branch
|
|
2. ✅ Remove all but one parsing format
|
|
3. ✅ Reduce tool instructions to <2000 tokens
|
|
4. ✅ Add unit tests for tool parsing
|
|
5. ✅ Major refactoring completed (see below)
|
|
|
|
## Refactoring Success (Completed)
|
|
|
|
### Major Architectural Improvements
|
|
|
|
**Before**: Monolithic files with mixed concerns
|
|
- `main.py`: 556 lines
|
|
- `routes.py`: 1,183 lines
|
|
- `registry.py`: 437 lines
|
|
- `selector.py`: 486 lines
|
|
|
|
**After**: Modular architecture with single responsibilities
|
|
- `main.py`: 99 lines (-82%)
|
|
- `routes.py`: 252 lines (-79%)
|
|
- `registry.py`: 194 lines (-56%)
|
|
- `selector.py`: 329 lines (-32%)
|
|
|
|
### Changes Made
|
|
|
|
**1. API Layer Modularization**
|
|
- Extracted `formatting.py` - Message formatting logic
|
|
- Extracted `tool_parser.py` - Tool parsing from various formats
|
|
- Extracted `chat_handlers.py` - Chat completion business logic
|
|
- `routes.py` now only handles HTTP routing (thin controllers)
|
|
|
|
**2. CLI Layer Separation**
|
|
- Created `cli/` package with:
|
|
- `parser.py` - CLI argument parsing
|
|
- `main_runner.py` - Main application orchestration
|
|
- `server_runner.py` - Server lifecycle management
|
|
- `test_runner.py` - Test mode execution
|
|
- `tool_server.py` - Tool server management
|
|
|
|
**3. Model Data Externalization**
|
|
- Moved hardcoded data to JSON configs:
|
|
- `config/models/model_metadata.json` - Model metadata
|
|
- `config/models/mlx_quant_sizes.json` - MLX VRAM requirements
|
|
- `config/models/gguf_quant_sizes.json` - GGUF VRAM requirements
|
|
- `config/models/selector_config.json` - Selection constants
|
|
- `registry.py` now loads from JSON instead of hardcoded dicts
|
|
|
|
**4. Utility Centralization**
|
|
- Created `utils/` package:
|
|
- `token_counter.py` - Centralized token counting
|
|
- `project_discovery.py` - Project root detection
|
|
- `network.py` - Network utilities (IP detection)
|
|
|
|
**5. Interactive Mode Modularization**
|
|
- Created `interactive/` package:
|
|
- `ui.py` - Menu display and input handling
|
|
- `display.py` - Hardware and resource display
|
|
- `tips.py` - Educational content
|
|
- `config_utils.py` - Configuration selection
|
|
|
|
**6. Swarm Orchestration**
|
|
- Created `swarm/orchestrator.py` - Generation orchestration logic
|
|
- Separated from `swarm/manager.py`
|
|
|
|
### Architecture Principles Established
|
|
|
|
1. **Single Responsibility**: Each module does one thing
|
|
2. **No Files > 300 Lines**: Most modules kept under limit
|
|
3. **No Functions > 50 Lines**: Large functions broken down
|
|
4. **No Nesting > 3 Levels**: Deep nesting refactored
|
|
5. **DRY Principle**: Code duplication eliminated
|
|
6. **Configuration Over Code**: Static data in JSON files
|
|
|
|
### Benefits
|
|
|
|
- **Testability**: Isolated modules are easier to test
|
|
- **Maintainability**: Changes affect only relevant modules
|
|
- **Readability**: Smaller files are easier to understand
|
|
- **Reusability**: Utilities can be used across the codebase
|
|
- **Collaboration**: Multiple developers can work on different modules
|
|
|
|
## Success Metrics
|
|
|
|
- ✅ Tool-related commits stabilized
|
|
- ✅ Zero "fix: prevent looping" commits
|
|
- ✅ All files under 300 lines (critical ones)
|
|
- ✅ Instructions stay under 2000 tokens
|
|
- ✅ 35 tests passing, no regressions
|
|
- ✅ Clean separation of concerns
|