Compare commits
2 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 0a97e4af8c | |||
| 580d1e5d17 |
@@ -151,3 +151,6 @@ cython_debug/
|
||||
config.local.yaml
|
||||
*.pid
|
||||
logs/
|
||||
|
||||
# Review reports
|
||||
reports/
|
||||
|
||||
+427
@@ -0,0 +1,427 @@
|
||||
# Agent Reviewer Rules
|
||||
|
||||
> **⚠️ IMPORTANT:** This document is for REVIEW AGENTS who handle commits, PRs, and code reviews.
|
||||
> Regular agents follow AGENT_WORKER.md for implementation tasks and DO NOT make commits.
|
||||
|
||||
## Review Philosophy
|
||||
|
||||
**Mission:** Prevent the circular development patterns identified in commit history.
|
||||
|
||||
**Standards:**
|
||||
- Reject code that doesn't meet quality bar
|
||||
- Ask for tests, don't accept "I'll add them later"
|
||||
- Check token counts for prompt changes
|
||||
- Verify architectural consistency
|
||||
- Demand clear error messages
|
||||
|
||||
**Reviewer Authority:**
|
||||
- Can block PR for: missing tests, token bloat, architecture violations
|
||||
- Cannot approve own code
|
||||
- Must provide constructive feedback with specific fixes
|
||||
|
||||
## Review Checklist
|
||||
|
||||
### Phase 1: Structure & Hygiene (Block if failed)
|
||||
|
||||
- [ ] **Branch naming follows convention**
|
||||
- Format: `type/description` (e.g., `fix/tool-parsing`)
|
||||
- Not: `quick-fix`, `temp-branch`, `dev`
|
||||
|
||||
- [ ] **Commit messages are clear**
|
||||
- Format: `type(scope): description`
|
||||
- No: `fix stuff`, `WIP`, `asdf`, `omg finally`
|
||||
- Each commit should be reviewable independently
|
||||
|
||||
- [ ] **No production debugging code**
|
||||
- Search for: `print(`, `console.log`, `debugger`, `TODO`, `FIXME`, `XXX`
|
||||
- Check: No commented-out code blocks
|
||||
- Check: No temporary files committed
|
||||
|
||||
- [ ] **Git history is clean**
|
||||
- No "fix typo" commits after initial commit
|
||||
- No "WIP" commits in PR
|
||||
- No merge commits (rebase instead)
|
||||
- Squash fixup commits
|
||||
|
||||
### Phase 2: Code Quality (Block if failed)
|
||||
|
||||
- [ ] **Tests exist and pass**
|
||||
- Unit tests for new functions
|
||||
- Integration tests for API changes
|
||||
- Run: `pytest -v` (must pass)
|
||||
- Coverage: ≥80% for new code
|
||||
- **BLOCKING:** No tests = No merge
|
||||
|
||||
- [ ] **Type hints present**
|
||||
- All function parameters typed
|
||||
- All return values typed
|
||||
- Run: `mypy src/` (must pass with zero errors)
|
||||
|
||||
- [ ] **No code smells**
|
||||
- No functions > 50 lines
|
||||
- No files > 300 lines
|
||||
- No indentation > 3 levels deep
|
||||
- No circular imports
|
||||
- No duplicate code (>3 lines copied)
|
||||
|
||||
- [ ] **Error handling is robust**
|
||||
- No bare `except:` clauses
|
||||
- All errors have clear messages
|
||||
- No silent failures
|
||||
- Edge cases handled
|
||||
|
||||
- [ ] **Documentation is adequate**
|
||||
- All public functions have docstrings
|
||||
- Complex logic has inline comments
|
||||
- README updated if user-facing change
|
||||
- Architecture doc updated if pattern changes
|
||||
|
||||
### Phase 3: Token Budget (Block if failed)
|
||||
|
||||
**For any prompt/instruction changes:**
|
||||
|
||||
- [ ] **Token count documented**
|
||||
- Before: X tokens
|
||||
- After: Y tokens
|
||||
- Change: +/- Z tokens
|
||||
|
||||
- [ ] **Within budget**
|
||||
- System prompt + instructions ≤ 2000 tokens (HARD LIMIT)
|
||||
- Leaves ≥ 50% context window for user input
|
||||
- **BLOCKING:** Over budget = Request reduction
|
||||
|
||||
- [ ] **Efficient wording**
|
||||
- No redundant examples
|
||||
- No verbose explanations
|
||||
- Prefer code over prose
|
||||
|
||||
**Token Counting Command:**
|
||||
```bash
|
||||
# Count tokens in a string
|
||||
echo "Your prompt here" | python -c "import sys; import tiktoken; enc = tiktoken.get_encoding('cl100k_base'); print(len(enc.encode(sys.stdin.read())))"
|
||||
```
|
||||
|
||||
### Phase 4: Architecture (Block if failed)
|
||||
|
||||
- [ ] **Consistent with ARCHITECTURE.md**
|
||||
- No new patterns without updating docs
|
||||
- No mixing of concerns
|
||||
- Follows existing module structure
|
||||
|
||||
- [ ] **No architecture changes in fixes**
|
||||
- Bug fixes should not refactor
|
||||
- Refactors should be separate PRs
|
||||
- **Exception:** If fix requires arch change, document WHY
|
||||
|
||||
- [ ] **Parser rules**
|
||||
- Only ONE parser per format
|
||||
- No alternative parsing paths
|
||||
- Clear regex patterns
|
||||
- Handles all documented cases
|
||||
|
||||
- [ ] **No feature flags in core**
|
||||
- Code should not have `if config.get("ENABLE_X"):`
|
||||
- Pick one approach, remove old one
|
||||
- A/B testing only in separate branch
|
||||
|
||||
### Phase 5: Research & Continuous Learning
|
||||
|
||||
**For significant changes (>100 lines or new algorithms):**
|
||||
|
||||
- [ ] **Research documented**
|
||||
- Check `research/` folder for related findings
|
||||
- PR description mentions alternatives considered
|
||||
- Links to sources (docs, papers, repos)
|
||||
- Not: "I thought this would work"
|
||||
- Yes: "Based on [source], this approach handles [case] better than [alternative]"
|
||||
|
||||
- [ ] **Best practices followed**
|
||||
- Implementation matches current language/framework conventions
|
||||
- No deprecated patterns
|
||||
- Modern Python features used appropriately (3.9+)
|
||||
|
||||
- [ ] **No reinvention**
|
||||
- Check if standard library solves the problem
|
||||
- Check if well-maintained package exists
|
||||
- If custom implementation needed, document WHY
|
||||
|
||||
**Research Documentation Requirements:**
|
||||
```markdown
|
||||
## Research
|
||||
- Alternatives considered: [list]
|
||||
- Sources: [links]
|
||||
- Decision: [why chosen approach]
|
||||
- Benchmarks: [if applicable]
|
||||
```
|
||||
|
||||
### Phase 6: Logic Correctness
|
||||
|
||||
- [ ] **Logic is sound**
|
||||
- Read through the code
|
||||
- Check edge cases
|
||||
- Verify error conditions
|
||||
- Question anything unclear
|
||||
|
||||
- [ ] **No performance regressions**
|
||||
- No blocking I/O in async functions (unless wrapped)
|
||||
- No memory leaks
|
||||
- No N+1 queries
|
||||
- Reasonable algorithmic complexity
|
||||
|
||||
- [ ] **Security check**
|
||||
- No SQL injection vectors
|
||||
- No command injection (bash execution sanitized)
|
||||
- Path traversal protection (for file ops)
|
||||
- No secrets in code
|
||||
|
||||
## Review Report Format
|
||||
|
||||
After review, write a report to `reports/PR-{number}-{branch}.md`:
|
||||
|
||||
```markdown
|
||||
# Review Report: PR #{number} - {branch}
|
||||
|
||||
**Reviewer:** {your name}
|
||||
**Date:** {YYYY-MM-DD}
|
||||
**Status:** [APPROVED / CHANGES_REQUESTED / BLOCKED]
|
||||
|
||||
## Summary
|
||||
Brief description of what this PR does and overall quality assessment.
|
||||
|
||||
## Detailed Findings
|
||||
|
||||
### ✅ Passed
|
||||
- [List items that passed review]
|
||||
- [Be specific: "Tests cover 85% of new code"]
|
||||
|
||||
### ⚠️ Warnings (Non-blocking)
|
||||
- [Minor issues that don't block merge]
|
||||
- [Style suggestions]
|
||||
- [Future improvements]
|
||||
|
||||
### ❌ Blockers (Must fix)
|
||||
1. **[Category]** [Specific issue]
|
||||
- **Location:** `file.py:123`
|
||||
- **Problem:** [What's wrong]
|
||||
- **Fix:** [Exactly what to change]
|
||||
- **Why:** [Why this matters]
|
||||
|
||||
2. **[Category]** [Specific issue]
|
||||
- ...
|
||||
|
||||
## Token Impact Analysis
|
||||
- Component: [what changed]
|
||||
- Before: [X] tokens
|
||||
- After: [Y] tokens
|
||||
- Impact: [+/- Z] tokens
|
||||
- Within budget: [Yes/No]
|
||||
|
||||
## Test Coverage
|
||||
- New code coverage: [X]%
|
||||
- Tests pass: [Yes/No]
|
||||
- Integration tests: [Present/Missing]
|
||||
|
||||
## Architecture Review
|
||||
- Follows existing patterns: [Yes/No]
|
||||
- Introduces new dependencies: [List if any]
|
||||
- Breaking changes: [Yes/No - explain if yes]
|
||||
|
||||
## Research Review
|
||||
- Alternatives considered: [Listed/None]
|
||||
- Sources cited: [Yes/No]
|
||||
- Best practices followed: [Yes/No]
|
||||
- Research documented: [Yes/No - location]
|
||||
|
||||
## Code Quality Score
|
||||
- Structure: [0-10]
|
||||
- Testing: [0-10]
|
||||
- Documentation: [0-10]
|
||||
- Logic: [0-10]
|
||||
- **Overall: [0-10]**
|
||||
|
||||
## Action Items
|
||||
- [ ] [Specific fix needed]
|
||||
- [ ] [Specific fix needed]
|
||||
- [ ] [Test to add]
|
||||
|
||||
## Verdict
|
||||
[APPROVED / CHANGES_REQUESTED / BLOCKED]
|
||||
|
||||
**If CHANGES_REQUESTED:**
|
||||
- Address all blockers
|
||||
- Re-request review when ready
|
||||
|
||||
**If BLOCKED:**
|
||||
- Major issues require architecture discussion
|
||||
- Schedule meeting before continuing
|
||||
```
|
||||
|
||||
## Severity Levels
|
||||
|
||||
### 🔴 BLOCKING (Cannot merge)
|
||||
- Missing tests for new functionality
|
||||
- Token budget exceeded
|
||||
- Bare `except:` clauses
|
||||
- Production debugging code (`print` statements)
|
||||
- Breaking changes without documentation
|
||||
- Security vulnerabilities
|
||||
- Tests failing
|
||||
- Type check errors
|
||||
- Architecture violations
|
||||
|
||||
### 🟡 CHANGES_REQUESTED (Fix before merge)
|
||||
- Unclear variable names
|
||||
- Missing docstrings
|
||||
- Inefficient algorithms
|
||||
- Missing error handling
|
||||
- Unclear commit messages
|
||||
- Minor style issues
|
||||
|
||||
### 🟢 APPROVED (Optional suggestions)
|
||||
- Style preferences
|
||||
- Future improvements
|
||||
- Optional refactors
|
||||
|
||||
## Common Issues to Watch For
|
||||
|
||||
### Issue 1: Tool Parsing Duplication
|
||||
```python
|
||||
# ❌ WRONG - Multiple parsers
|
||||
def parse_tools_v1(text): ...
|
||||
def parse_tools_v2(text): ...
|
||||
def parse_tools_legacy(text): ...
|
||||
|
||||
# ✅ CORRECT - Single parser
|
||||
TOOL_PATTERN = r'TOOL:\s*(\w+)\s*\nARGUMENTS:\s*(\{[^}]*\})'
|
||||
```
|
||||
|
||||
**Check:** Search for "def parse" - should be ONE per format.
|
||||
|
||||
### Issue 2: Token Bloat
|
||||
```python
|
||||
# ❌ WRONG - Too verbose
|
||||
SYSTEM_PROMPT = """
|
||||
You are an AI assistant. Here are detailed instructions...
|
||||
[2000 words of explanation]
|
||||
[10 examples]
|
||||
"""
|
||||
|
||||
# ✅ CORRECT - Concise
|
||||
SYSTEM_PROMPT = """Use TOOL: name\nARGUMENTS: {...} format. Available: read, write, bash."""
|
||||
```
|
||||
|
||||
**Check:** Count tokens, verify < 2000.
|
||||
|
||||
### Issue 3: Architecture Drift
|
||||
```python
|
||||
# ❌ WRONG - Mixing concerns in one file
|
||||
# src/api/routes.py
|
||||
def handle_request(): ...
|
||||
def parse_tools(): ...
|
||||
def execute_tool(): ...
|
||||
def format_response(): ...
|
||||
|
||||
# ✅ CORRECT - Separated
|
||||
# src/api/routes.py - only HTTP handling
|
||||
# src/tools/parser.py - only parsing
|
||||
# src/tools/executor.py - only execution
|
||||
```
|
||||
|
||||
**Check:** Each module has ONE responsibility.
|
||||
|
||||
### Issue 4: Debug Code Left In
|
||||
```python
|
||||
# ❌ WRONG
|
||||
def process(data):
|
||||
print(f"DEBUG: data={data}") # REMOVE THIS
|
||||
result = transform(data)
|
||||
print(f"DEBUG: result={result}") # REMOVE THIS
|
||||
return result
|
||||
|
||||
# ✅ CORRECT
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def process(data):
|
||||
logger.debug("Processing data", extra={"data_size": len(data)})
|
||||
return transform(data)
|
||||
```
|
||||
|
||||
**Check:** `grep -r "print(" src/ --include="*.py" | grep -v "^#"`
|
||||
|
||||
### Issue 5: Missing Error Context
|
||||
```python
|
||||
# ❌ WRONG
|
||||
raise ValueError("Invalid input")
|
||||
|
||||
# ✅ CORRECT
|
||||
raise ValueError(f"Invalid model format: '{model_str}'. Expected: 'name:size:quant' (e.g., 'qwen:7b:q4')")
|
||||
```
|
||||
|
||||
**Check:** All errors explain what was expected vs received.
|
||||
|
||||
## Review Workflow
|
||||
|
||||
1. **First Pass: Structure** (5 min)
|
||||
- Check branch name, commits, no debug code
|
||||
- If failed → Write report, BLOCK
|
||||
|
||||
2. **Second Pass: Quality** (10 min)
|
||||
- Run tests, check types, review code
|
||||
- If failed → Write report, CHANGES_REQUESTED
|
||||
|
||||
3. **Third Pass: Deep Dive** (15 min)
|
||||
- Read logic, check edge cases
|
||||
- Verify token counts
|
||||
- Check architecture
|
||||
- Write detailed report
|
||||
|
||||
4. **Final Decision** (5 min)
|
||||
- APPROVE / CHANGES_REQUESTED / BLOCK
|
||||
- Write report to `reports/` folder
|
||||
- Post summary in PR comments
|
||||
|
||||
**Total time per review: 30-35 minutes**
|
||||
|
||||
## Reviewer Self-Check
|
||||
|
||||
Before submitting review:
|
||||
- [ ] I ran all tests locally
|
||||
- [ ] I checked type hints
|
||||
- [ ] I counted tokens (if applicable)
|
||||
- [ ] I read every line of changed code
|
||||
- [ ] My feedback is specific and actionable
|
||||
- [ ] I explained WHY for each blocker
|
||||
- [ ] I wrote a report to `reports/` folder
|
||||
|
||||
## Escalation
|
||||
|
||||
Escalate to architecture discussion if:
|
||||
- PR changes core patterns
|
||||
- Token budget cannot be met
|
||||
- Two reviewers disagree
|
||||
- Breaking changes proposed
|
||||
|
||||
**Don't just approve to be nice.**
|
||||
**Don't let technical debt accumulate.**
|
||||
|
||||
## Report Storage
|
||||
|
||||
All reports go in `reports/` folder:
|
||||
```
|
||||
reports/
|
||||
├── PR-123-fix-tool-parsing.md
|
||||
├── PR-124-add-federation.md
|
||||
├── PR-125-refactor-consensus.md
|
||||
└── README.md # Index of all reviews
|
||||
```
|
||||
|
||||
**This folder is gitignored - reports stay local.**
|
||||
|
||||
Generate index with:
|
||||
```bash
|
||||
ls -1 reports/PR-*.md | sort -t'-' -k2 -n > reports/README.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Remember: You're the last line of defense against technical debt. Be thorough, be kind, be strict.**
|
||||
+790
@@ -0,0 +1,790 @@
|
||||
# Agent Worker Rules
|
||||
|
||||
> **⚠️ IMPORTANT:** This document is for IMPLEMENTATION AGENTS (coding, testing, documentation).
|
||||
> **DO NOT MAKE COMMITS** - that's the AGENT_REVIEW.md agent's job.
|
||||
|
||||
## Pre-Flight Checklist (MUST complete before coding)
|
||||
|
||||
### ⚠️ GIT OPERATIONS REMINDER
|
||||
**DO NOT make commits.** Commits are ONLY handled by AGENT_REVIEW.md agents.
|
||||
You CAN create branches and stage files (git add), but DO NOT commit (git commit).
|
||||
|
||||
### 1. Token Budget Verification
|
||||
- [ ] System prompt + instructions ≤ 2000 tokens (hard limit)
|
||||
- [ ] Leave ≥ 50% of context window for user input
|
||||
- [ ] If adding documentation/examples, remove old ones to maintain budget
|
||||
- [ ] Use `tiktoken` or estimate: ~4 chars = 1 token
|
||||
|
||||
### 2. Test Plan Required
|
||||
Before writing ANY code, write a test plan:
|
||||
```markdown
|
||||
## Test Plan for [Feature]
|
||||
|
||||
### Unit Tests
|
||||
- [ ] Test case 1: [specific input] → [expected output]
|
||||
- [ ] Test case 2: [edge case]
|
||||
- [ ] Test case 3: [error condition]
|
||||
|
||||
### Integration Tests
|
||||
- [ ] End-to-end flow: [steps]
|
||||
- [ ] Expected result: [what success looks like]
|
||||
|
||||
### Manual Verification
|
||||
- [ ] Command to run: [exact command]
|
||||
- [ ] Expected output: [what to see]
|
||||
```
|
||||
|
||||
### 3. Design Decision Document
|
||||
For any change > 50 lines:
|
||||
```markdown
|
||||
## Design Decision
|
||||
|
||||
### Problem
|
||||
[What are we solving?]
|
||||
|
||||
### Options Considered
|
||||
1. [Option A] - Pros: ..., Cons: ...
|
||||
2. [Option B] - Pros: ..., Cons: ...
|
||||
|
||||
### Decision
|
||||
[Which option and WHY]
|
||||
|
||||
### Impact
|
||||
- Token count change: [+/- X tokens]
|
||||
- Breaking changes: [Yes/No]
|
||||
- Migration needed: [Yes/No]
|
||||
```
|
||||
|
||||
## Coding Rules
|
||||
|
||||
### Rule 1: One Feature = One Commit
|
||||
**NOTE:** Regular agents DO NOT make commits. AGENT_REVIEW.md agents handle commits.
|
||||
|
||||
When AGENT_REVIEW.md agents make commits:
|
||||
- Never combine unrelated changes in one commit
|
||||
- If you fix a bug AND refactor, make 2 commits
|
||||
- Commit message format: `type(scope): description`
|
||||
- Types: `feat`, `fix`, `refactor`, `test`, `docs`, `chore`
|
||||
- Example: `feat(tools): add working directory support`
|
||||
|
||||
### Rule 2: Tests First (TDD)
|
||||
```python
|
||||
# BAD: Write code, maybe test later
|
||||
def parse_tools(text):
|
||||
# ... implementation ...
|
||||
pass
|
||||
|
||||
# GOOD: Write test first
|
||||
def test_parse_simple_tool():
|
||||
text = 'TOOL: read\nARGUMENTS: {"filePath": "test.txt"}'
|
||||
content, tools = parse_tool_calls(text)
|
||||
assert len(tools) == 1
|
||||
assert tools[0]["function"]["name"] == "read"
|
||||
|
||||
# Then write minimal code to pass
|
||||
```
|
||||
|
||||
### Rule 3: No Production Debugging
|
||||
- NEVER add `print()` statements for debugging
|
||||
- Use `logging` module with appropriate levels
|
||||
- Remove ALL debug logging before committing
|
||||
- Exception: Structured logging for observability (metrics, errors)
|
||||
|
||||
```python
|
||||
# BAD
|
||||
def process_request(request):
|
||||
print(f"DEBUG: Got request {request}") # REMOVE THIS
|
||||
result = handle(request)
|
||||
print(f"DEBUG: Result {result}") # REMOVE THIS
|
||||
return result
|
||||
|
||||
# GOOD
|
||||
def process_request(request):
|
||||
logger.debug("Processing request", extra={"request_id": request.id})
|
||||
result = handle(request)
|
||||
return result
|
||||
```
|
||||
|
||||
### Rule 4: Architecture Consistency
|
||||
- Check ARCHITECTURE.md before changing patterns
|
||||
- If unsure, ask in PR description
|
||||
- NEVER change architecture in a "fix" commit
|
||||
- Architecture changes require design doc + team review
|
||||
|
||||
### Rule 5: Parse Once, Parse Well
|
||||
- ONE parser per format
|
||||
- If adding new format, remove old one
|
||||
- Parser must handle all documented cases
|
||||
- Parser must fail gracefully (return empty, not crash)
|
||||
|
||||
```python
|
||||
# BAD: Multiple parsers for same thing
|
||||
def parse_tools_v1(text): ...
|
||||
def parse_tools_v2(text): ...
|
||||
def parse_tools_legacy(text): ...
|
||||
|
||||
# GOOD: Single parser with clear regex
|
||||
TOOL_PATTERN = r'TOOL:\s*(\w+)\s*\nARGUMENTS:\s*(\{[^}]*\})'
|
||||
|
||||
def parse_tool_calls(text: str) -> Tuple[str, List[dict]]:
|
||||
matches = list(re.finditer(TOOL_PATTERN, text, re.IGNORECASE))
|
||||
if not matches:
|
||||
return text, []
|
||||
# ... rest of parsing ...
|
||||
```
|
||||
|
||||
### Rule 6: Token-Aware Documentation
|
||||
- Every docstring/example has a token cost
|
||||
- Count tokens before adding
|
||||
- If over budget, remove something else
|
||||
- Prioritize: Code clarity > Examples > Explanations
|
||||
|
||||
```python
|
||||
# BAD: 150 tokens of fluff
|
||||
def calculate(x, y):
|
||||
"""
|
||||
This function calculates the sum of two numbers.
|
||||
|
||||
The sum is calculated by using the built-in Python
|
||||
addition operator which adds the values together.
|
||||
|
||||
Args:
|
||||
x (int): The first number to add
|
||||
y (int): The second number to add
|
||||
|
||||
Returns:
|
||||
int: The sum of x and y
|
||||
|
||||
Example:
|
||||
>>> calculate(1, 2)
|
||||
3
|
||||
"""
|
||||
return x + y
|
||||
|
||||
# GOOD: 20 tokens, clear enough
|
||||
def calculate(x: int, y: int) -> int:
|
||||
"""Return sum of x and y."""
|
||||
return x + y
|
||||
```
|
||||
|
||||
### Rule 7: Clear Error Messages
|
||||
- Every error must tell user EXACTLY what went wrong
|
||||
- Include context: what was expected vs what was received
|
||||
- Suggest fix if possible
|
||||
|
||||
```python
|
||||
# BAD
|
||||
raise ValueError("Invalid input")
|
||||
|
||||
# GOOD
|
||||
raise ValueError(f"Invalid model format: '{model_str}'. Expected: 'name:size:quant' (e.g., 'qwen:7b:q4')")
|
||||
```
|
||||
|
||||
### Rule 8: No Circular Imports
|
||||
```python
|
||||
# BAD: src/a.py imports src/b.py, src/b.py imports src/a.py
|
||||
|
||||
# GOOD: Use dependency injection or move shared code to common module
|
||||
```
|
||||
|
||||
## Git Workflow Rules
|
||||
|
||||
### CRITICAL: Commit Handling
|
||||
|
||||
**REGULAR AGENTS: DO NOT MAKE COMMITS**
|
||||
- Regular agents do NOT create commits, pull requests, or manage git history
|
||||
- Commits are ONLY handled by agents following AGENT_REVIEW.md guidelines
|
||||
- If you need to commit code, the AGENT_REVIEW.md agent should handle it
|
||||
- Exception: You may manually stage files (git add) for the review agent
|
||||
- **You CAN create and checkout branches** (that's fine) - just don't commit to them
|
||||
|
||||
### Branch Strategy
|
||||
|
||||
**Main Branches (Protected):**
|
||||
- `main` - Production-ready code only
|
||||
- `develop` - Integration branch for features (optional for small projects)
|
||||
|
||||
**Working Branches (Temporary - AGENT_REVIEW.md ONLY):**
|
||||
```
|
||||
feature/description # New features
|
||||
fix/description # Bug fixes
|
||||
refactor/description # Code refactoring
|
||||
hotfix/description # Critical production fixes
|
||||
docs/description # Documentation only
|
||||
experiment/description # Experimental work (may be deleted)
|
||||
```
|
||||
|
||||
**Note:** Regular agents should NOT create branches or handle git operations
|
||||
|
||||
### Workflow Steps
|
||||
|
||||
#### 1. Starting New Work
|
||||
```bash
|
||||
# ALWAYS start from main
|
||||
git checkout main
|
||||
git pull origin main
|
||||
|
||||
# Create feature branch
|
||||
git checkout -b feature/description
|
||||
|
||||
# Push branch to remote immediately
|
||||
git push -u origin feature/description
|
||||
```
|
||||
|
||||
#### 2. During Development
|
||||
```bash
|
||||
# Commit often (small, logical commits)
|
||||
git add -p # Stage interactively (review each change)
|
||||
git commit -m "feat(scope): description"
|
||||
|
||||
# Push regularly (backup)
|
||||
git push origin feature/description
|
||||
|
||||
# Keep up-to-date with main
|
||||
git fetch origin
|
||||
git rebase origin/main # Resolve conflicts immediately
|
||||
```
|
||||
|
||||
#### 3. Before PR (Final Cleanup)
|
||||
```bash
|
||||
# Interactive rebase to clean history
|
||||
git rebase -i main
|
||||
|
||||
# Squash these:
|
||||
# - "fix typo"
|
||||
# - "WIP"
|
||||
# - "asdf"
|
||||
# - "omg finally"
|
||||
# - Multiple attempts at same fix
|
||||
|
||||
# Keep separate:
|
||||
# - Logical feature steps
|
||||
# - Refactoring separate from features
|
||||
# - Test additions separate from code changes
|
||||
```
|
||||
|
||||
#### 4. Creating PR
|
||||
- Push final branch: `git push origin feature/description`
|
||||
- Create PR to `main` (not develop unless project uses git-flow)
|
||||
- Fill PR template completely
|
||||
- Request review from AGENT_REVIEW.md qualified reviewer
|
||||
- Link related issues: `Closes #123`, `Fixes #456`
|
||||
|
||||
### Commit Rules
|
||||
|
||||
**Commit Frequency:**
|
||||
- Commit after each logical step (not just at end of day)
|
||||
- Each commit should leave codebase in working state
|
||||
- "Work in progress" commits OK on feature branches (clean before PR)
|
||||
|
||||
**Commit Size:**
|
||||
- Max 200 lines changed per commit
|
||||
- Max 5 files changed per commit (unless related)
|
||||
- Each commit reviewable in 5 minutes
|
||||
- Split large changes:
|
||||
```bash
|
||||
# BAD: One giant commit
|
||||
git commit -am "Add federation + fix bugs + refactor + docs"
|
||||
|
||||
# GOOD: Separate commits
|
||||
git commit -m "refactor(network): extract peer discovery logic"
|
||||
git commit -m "feat(federation): implement cross-swarm voting"
|
||||
git commit -m "fix(federation): handle peer timeout edge case"
|
||||
git commit -m "docs: update federation architecture docs"
|
||||
```
|
||||
|
||||
**Commit Message Format:**
|
||||
```
|
||||
type(scope): subject (50 chars or less)
|
||||
|
||||
Body (wrap at 72 chars):
|
||||
- Why this change was made
|
||||
- What problem it solves
|
||||
- Any breaking changes or migration notes
|
||||
|
||||
Refs: #123, #456
|
||||
```
|
||||
|
||||
**Types:**
|
||||
- `feat`: New feature
|
||||
- `fix`: Bug fix
|
||||
- `refactor`: Code restructuring (no behavior change)
|
||||
- `test`: Adding/updating tests
|
||||
- `docs`: Documentation only
|
||||
- `chore`: Build, dependencies, tooling
|
||||
- `perf`: Performance improvement
|
||||
- `style`: Formatting (no code change)
|
||||
|
||||
**Subject Rules:**
|
||||
- Use imperative mood: "Add feature" not "Added feature"
|
||||
- No period at end
|
||||
- Lowercase after type
|
||||
- Max 50 characters
|
||||
|
||||
### Branch Hygiene
|
||||
|
||||
**DO:**
|
||||
- Create branch from latest main
|
||||
- Use descriptive branch names
|
||||
- Push branch to remote immediately
|
||||
- Rebase onto main regularly
|
||||
- Delete merged branches
|
||||
- Squash fixup commits before PR
|
||||
|
||||
**DON'T:**
|
||||
- Commit directly to main
|
||||
- Have long-lived branches (>1 week without rebase)
|
||||
- Include unrelated changes in one branch
|
||||
- Commit broken code (even temporarily)
|
||||
- Force push to shared branches
|
||||
- Merge without review
|
||||
|
||||
### Handling Conflicts
|
||||
|
||||
```bash
|
||||
# While rebasing
|
||||
git rebase main
|
||||
# Conflicts happen...
|
||||
|
||||
# Resolve conflicts in files
|
||||
git add <resolved-files>
|
||||
git rebase --continue
|
||||
|
||||
# If messed up, abort
|
||||
git rebase --abort
|
||||
```
|
||||
|
||||
**Conflict Resolution Rules:**
|
||||
1. Understand both changes before resolving
|
||||
2. Don't just pick "ours" or "theirs"
|
||||
3. Test after resolving
|
||||
4. Commit message should explain resolution
|
||||
|
||||
### Emergency Procedures
|
||||
|
||||
**Committed to wrong branch:**
|
||||
```bash
|
||||
# Undo last commit (keep changes)
|
||||
git reset HEAD~1
|
||||
|
||||
# Stash changes
|
||||
git stash
|
||||
|
||||
# Switch to correct branch
|
||||
git checkout correct-branch
|
||||
|
||||
# Apply changes
|
||||
git stash pop
|
||||
|
||||
# Commit properly
|
||||
git commit -m "..."
|
||||
```
|
||||
|
||||
**Need to undo pushed commit:**
|
||||
```bash
|
||||
# Revert (creates new commit, safe for shared history)
|
||||
git revert <commit-hash>
|
||||
git push origin branch-name
|
||||
|
||||
# OR if feature branch not shared yet
|
||||
# Reset and force push (DANGEROUS)
|
||||
git reset --hard HEAD~1
|
||||
git push --force-with-lease origin branch-name
|
||||
```
|
||||
|
||||
### Release Process
|
||||
|
||||
**NOTE:** Release process should be handled by AGENT_REVIEW.md agents.
|
||||
|
||||
```bash
|
||||
# Create release branch
|
||||
git checkout -b release/v1.2.0
|
||||
|
||||
# Bump version, update changelog
|
||||
git commit -m "chore: bump version to 1.2.0"
|
||||
|
||||
# Tag release
|
||||
git tag -a v1.2.0 -m "Release version 1.2.0"
|
||||
git push origin v1.2.0
|
||||
|
||||
# Merge to main
|
||||
git checkout main
|
||||
git merge --no-ff release/v1.2.0
|
||||
git push origin main
|
||||
|
||||
# Delete release branch
|
||||
git branch -d release/v1.2.0
|
||||
```
|
||||
|
||||
### What Regular Agents Should NOT Do
|
||||
|
||||
**REGULAR AGENTS DO NOT:**
|
||||
- Make commits (git commit)
|
||||
- Create pull requests
|
||||
- Push to remote repositories
|
||||
- Merge branches
|
||||
- Manage git history (rebase, reset, etc.)
|
||||
- Delete branches
|
||||
|
||||
**REGULAR AGENTS CAN:**
|
||||
- Create and checkout branches (git checkout -b)
|
||||
- Stage files for review (git add)
|
||||
- Switch between branches
|
||||
|
||||
**REGULAR AGENTS SHOULD:**
|
||||
- Write code and tests
|
||||
- Run tests locally
|
||||
- Use logging instead of print()
|
||||
- Follow code quality standards
|
||||
- Document changes in code comments or design docs
|
||||
- Hand off completed work to AGENT_REVIEW.md agent for commit/PR creation
|
||||
|
||||
**Example Workflow:**
|
||||
```
|
||||
1. Agent reads task from user
|
||||
2. Agent creates feature branch (git checkout -b feature/name)
|
||||
3. Agent implements feature (writes code, tests, docs)
|
||||
4. Agent stages changes for review (git add)
|
||||
5. Agent reports completion with summary of changes
|
||||
6. AGENT_REVIEW.md agent:
|
||||
- Reviews code quality
|
||||
- Makes commits
|
||||
- Creates PR
|
||||
```
|
||||
|
||||
### Pre-Commit Checklist
|
||||
- [ ] Code passes `pytest` (if tests exist)
|
||||
- [ ] No `print()` statements (use logging)
|
||||
- [ ] No bare `except:` clauses
|
||||
- [ ] All functions have type hints
|
||||
- [ ] All public functions have docstrings
|
||||
- [ ] No TODO comments (create issues instead)
|
||||
- [ ] Token count checked (if modifying prompts)
|
||||
|
||||
## Testing Requirements
|
||||
|
||||
### Unit Test Coverage
|
||||
Minimum 80% coverage for:
|
||||
- Parsing functions
|
||||
- Business logic
|
||||
- State machines
|
||||
|
||||
### Integration Tests Required For:
|
||||
- API endpoints
|
||||
- Tool execution
|
||||
- File operations
|
||||
- Network calls (mocked)
|
||||
|
||||
### Test File Structure
|
||||
```
|
||||
tests/
|
||||
├── unit/
|
||||
│ ├── test_parser.py
|
||||
│ ├── test_executor.py
|
||||
│ └── test_consensus.py
|
||||
├── integration/
|
||||
│ ├── test_api.py
|
||||
│ └── test_tools.py
|
||||
└── fixtures/
|
||||
└── sample_responses.json
|
||||
```
|
||||
|
||||
## Code Quality Standards
|
||||
|
||||
### Python Style
|
||||
- Follow PEP 8
|
||||
- Use type hints for all function signatures
|
||||
- Max line length: 100 characters
|
||||
- Max function length: 50 lines
|
||||
- Max file length: 300 lines (split if larger)
|
||||
|
||||
### Imports (Order Matters)
|
||||
```python
|
||||
# 1. Standard library
|
||||
import os
|
||||
import sys
|
||||
from typing import List
|
||||
|
||||
# 2. Third party
|
||||
import numpy as np
|
||||
from fastapi import APIRouter
|
||||
|
||||
# 3. Local (absolute imports only)
|
||||
from src.tools.executor import ToolExecutor
|
||||
from src.swarm.manager import SwarmManager
|
||||
```
|
||||
|
||||
### Documentation Standards
|
||||
Every module must have:
|
||||
```python
|
||||
"""Module purpose in one line.
|
||||
|
||||
Longer description if needed (2-3 sentences max).
|
||||
"""
|
||||
```
|
||||
|
||||
Every public function must have:
|
||||
```python
|
||||
def process_data(data: dict, options: Optional[dict] = None) -> Result:
|
||||
"""Process data with given options.
|
||||
|
||||
Args:
|
||||
data: Input data to process
|
||||
options: Processing options (default: None)
|
||||
|
||||
Returns:
|
||||
Processed result
|
||||
|
||||
Raises:
|
||||
ValueError: If data is invalid
|
||||
"""
|
||||
```
|
||||
|
||||
## Architecture Rules
|
||||
|
||||
### No Feature Flags in Core Logic
|
||||
```python
|
||||
# BAD
|
||||
if config.get("USE_NEW_PARSER", False):
|
||||
result = new_parser(text)
|
||||
else:
|
||||
result = old_parser(text)
|
||||
|
||||
# GOOD: Pick one, remove the other
|
||||
def parse_tool_calls(text: str) -> Tuple[str, List[dict]]:
|
||||
"""Parse tool calls from text."""
|
||||
# Single implementation
|
||||
```
|
||||
|
||||
### No Code Duplication
|
||||
- If you copy-paste > 3 lines, extract to function
|
||||
- Shared code goes in `src/common/` or `src/utils/`
|
||||
|
||||
### Separation of Concerns
|
||||
```
|
||||
src/
|
||||
├── parser/ # Only parsing logic
|
||||
├── executor/ # Only execution logic
|
||||
├── formatter/ # Only formatting/output
|
||||
└── integration/ # Only API glue code
|
||||
```
|
||||
|
||||
## Forbidden Patterns
|
||||
|
||||
### Never Do These:
|
||||
1. **Bare except clauses** - Always catch specific exceptions
|
||||
2. **Production debugging** - No `print()`, use logging
|
||||
3. **Multiple return formats** - One function = one return type
|
||||
4. **Silent failures** - Always log/report errors
|
||||
5. **Magic numbers** - Use named constants
|
||||
6. **Global state** - Use dependency injection
|
||||
7. **Deep nesting** - Max 3 levels of indentation
|
||||
8. **Circular dependencies** - Re-architect if needed
|
||||
|
||||
## Review Preparation
|
||||
|
||||
Before marking PR ready:
|
||||
|
||||
1. **Self-Review Checklist** (check each item):
|
||||
- [ ] Tests pass: `pytest -v`
|
||||
- [ ] Type checking: `mypy src/`
|
||||
- [ ] Linting: `ruff check src/`
|
||||
- [ ] Formatting: `black src/`
|
||||
- [ ] Token count verified (if applicable)
|
||||
- [ ] No debug code left in
|
||||
- [ ] Commit messages follow format
|
||||
- [ ] Documentation updated
|
||||
|
||||
2. **PR Description Template**:
|
||||
```markdown
|
||||
## Changes
|
||||
- [Brief description]
|
||||
|
||||
## Testing
|
||||
- [How you tested it]
|
||||
|
||||
## Token Impact (if applicable)
|
||||
- Before: X tokens
|
||||
- After: Y tokens
|
||||
- Change: +/- Z tokens
|
||||
|
||||
## Checklist
|
||||
- [ ] Tests added/updated
|
||||
- [ ] Documentation updated
|
||||
- [ ] Self-review completed
|
||||
```
|
||||
|
||||
3. **Run Final Verification**:
|
||||
```bash
|
||||
# Run all checks
|
||||
pytest && mypy src/ && ruff check src/ && black --check src/
|
||||
```
|
||||
|
||||
## Continuous Learning & Research
|
||||
|
||||
You MUST periodically research best practices and alternative implementations. This prevents stagnation and ensures we're using proven approaches.
|
||||
|
||||
### When to Research
|
||||
|
||||
**Before Major Features:**
|
||||
- Spend 15-30 minutes researching similar implementations
|
||||
- Check: GitHub, Stack Overflow, official docs, research papers
|
||||
- Document findings in PR description
|
||||
|
||||
**Monthly Reviews:**
|
||||
- Review project's core technologies for updates
|
||||
- Check if better libraries/algorithms exist
|
||||
- Look for deprecated patterns we're using
|
||||
|
||||
**When Stuck:**
|
||||
- Don't brute force a solution
|
||||
- Research how others solved similar problems
|
||||
- Consider if problem indicates architectural issue
|
||||
|
||||
### What to Research
|
||||
|
||||
**1. Best Practices**
|
||||
```bash
|
||||
# Search queries to use:
|
||||
"python async best practices 2024"
|
||||
"fastapi error handling patterns"
|
||||
"LLM consensus voting algorithms"
|
||||
"gguf quantization comparison"
|
||||
```
|
||||
|
||||
**2. Similar Implementations**
|
||||
- Search GitHub for similar projects
|
||||
- Read their architecture decisions
|
||||
- Check their issues for pitfalls they hit
|
||||
- Note: Don't copy code blindly, understand WHY
|
||||
|
||||
**3. Research Papers & Benchmarks**
|
||||
- For consensus algorithms
|
||||
- For quantization strategies
|
||||
- For context window optimization
|
||||
- For distributed systems patterns
|
||||
|
||||
**4. Library Updates**
|
||||
- Check CHANGELOG of major dependencies
|
||||
- Review migration guides
|
||||
- Test new features in separate branch
|
||||
|
||||
### Documentation of Research
|
||||
|
||||
Create `research/YYYY-MM-DD-topic.md` for significant findings:
|
||||
|
||||
```markdown
|
||||
# Research: [Topic]
|
||||
|
||||
**Date:** YYYY-MM-DD
|
||||
**Researcher:** [Name]
|
||||
**Trigger:** [Why researched this]
|
||||
|
||||
## Findings
|
||||
|
||||
### Option 1: [Name]
|
||||
- Source: [Link]
|
||||
- Pros: ...
|
||||
- Cons: ...
|
||||
- Complexity: Low/Medium/High
|
||||
|
||||
### Option 2: [Name]
|
||||
- Source: [Link]
|
||||
- Pros: ...
|
||||
- Cons: ...
|
||||
- Complexity: Low/Medium/High
|
||||
|
||||
## Recommendation
|
||||
[Which option and WHY]
|
||||
|
||||
## Implementation Notes
|
||||
[Specific code changes needed]
|
||||
|
||||
## Risks
|
||||
[What could go wrong]
|
||||
```
|
||||
|
||||
### Research Checklist
|
||||
|
||||
**Before implementing:**
|
||||
- [ ] Searched for similar open-source implementations
|
||||
- [ ] Checked recent best practices (2023+)
|
||||
- [ ] Looked for benchmarking data if applicable
|
||||
- [ ] Reviewed alternative approaches
|
||||
- [ ] Considered long-term maintenance implications
|
||||
|
||||
**After implementing:**
|
||||
- [ ] Documented why chosen approach was selected
|
||||
- [ ] Added comments linking to research sources
|
||||
- [ ] Created test comparing against alternatives (if applicable)
|
||||
|
||||
### Example Research Topics
|
||||
|
||||
**Immediate:**
|
||||
- "Python type hints best practices 2024"
|
||||
- "FastAPI dependency injection patterns"
|
||||
- "LLM tool use format comparison"
|
||||
|
||||
**Short-term:**
|
||||
- "Consensus algorithms for distributed LLM systems"
|
||||
- "Context window compression techniques"
|
||||
- "GGUF quantization vs other formats"
|
||||
|
||||
**Long-term:**
|
||||
- "Speculative decoding implementation"
|
||||
- "PagedAttention for multiple workers"
|
||||
- "RAG integration patterns"
|
||||
|
||||
### Research Sources
|
||||
|
||||
**Reliable:**
|
||||
- Official documentation (Python, FastAPI, etc.)
|
||||
- Well-maintained GitHub repos (>1k stars, active)
|
||||
- Recent conference talks (PyCon, NeurIPS, etc.)
|
||||
- Research papers with code (Papers With Code)
|
||||
- Official blogs (Python.org, FastAPI.tiangolo.com)
|
||||
|
||||
**Use with Caution:**
|
||||
- Medium articles (variable quality)
|
||||
- Old Stack Overflow answers (>2 years)
|
||||
- Tutorial sites (often outdated)
|
||||
- YouTube videos (hard to verify)
|
||||
|
||||
### Integration with Development
|
||||
|
||||
**Weekly:**
|
||||
- Spend 30 minutes reading about one technology we use
|
||||
- Note any improvements we could make
|
||||
- Create issues for promising findings
|
||||
|
||||
**Monthly:**
|
||||
- Review all open research issues
|
||||
- Prioritize based on impact vs effort
|
||||
- Schedule implementation of high-value items
|
||||
|
||||
**Quarterly:**
|
||||
- Architecture review: Are our patterns still best?
|
||||
- Dependency audit: Updates needed?
|
||||
- Performance review: Could we be faster?
|
||||
|
||||
---
|
||||
|
||||
**Remember:**
|
||||
- Research prevents reinvention of the wheel
|
||||
- But don't research forever - timebox it (30 min max for most decisions)
|
||||
- Document findings so others don't repeat the research
|
||||
- Apply critical thinking - "best practice" depends on context
|
||||
|
||||
---
|
||||
|
||||
## Breaking This Ruleset
|
||||
|
||||
If you MUST break a rule:
|
||||
1. Document WHY in code comments
|
||||
2. Get explicit approval in PR
|
||||
3. Create follow-up issue to fix properly
|
||||
4. Never break Rule 3 (No Production Debugging)
|
||||
|
||||
---
|
||||
|
||||
**Remember: Quality over speed. A fix that takes 2 days with tests is better than a fix that takes 2 hours and breaks 3 other things.**
|
||||
-204
@@ -1,204 +0,0 @@
|
||||
# Network Federation Status
|
||||
|
||||
## Overview
|
||||
Local Swarm has a federation system designed to allow multiple instances to collaborate on the same network, enabling distributed consensus and load balancing across multiple machines.
|
||||
|
||||
## Current Implementation Status
|
||||
|
||||
### ✅ What's Working
|
||||
|
||||
#### 1. Network Discovery (`src/network/discovery.py`)
|
||||
**Purpose**: Automatic discovery of other Local Swarm instances on the local network using mDNS/Bonjour.
|
||||
|
||||
**Key Components**:
|
||||
- `SwarmDiscovery` class - Main discovery service
|
||||
- `PeerInfo` dataclass - Stores information about peer swarms
|
||||
- `start_advertising()` - Announces this swarm to the network
|
||||
- `start_discovery()` - Listens for other swarms on the network
|
||||
- `create_discovery_service()` - Factory function to create discovery instance
|
||||
|
||||
**How It Works**:
|
||||
- Uses mDNS service type: `_local-swarm._tcp.local.`
|
||||
- Advertises on port 63323 (discovery) + API port (17615)
|
||||
- Broadcasts: version, instances, model_id, hardware_summary
|
||||
- Peers timeout after 60 seconds if not seen
|
||||
|
||||
#### 2. Federation Client (`src/network/federation.py`)
|
||||
**Purpose**: Communication protocol between peer swarms.
|
||||
|
||||
**Key Components**:
|
||||
- `FederationClient` class - HTTP client for peer communication
|
||||
- `FederatedSwarm` class - Wraps local swarm with federation logic
|
||||
- `request_vote()` - Gets generation results from peers
|
||||
- `generate_with_federation()` - Coordinates distributed generation
|
||||
- Federation strategies: `best_of_n`, `weighted_vote`, `first_valid`
|
||||
|
||||
**API Endpoints** (not yet exposed):
|
||||
- `POST /v1/federation/vote` - Request generation from peer
|
||||
- `GET /v1/federation/health` - Check peer health
|
||||
|
||||
#### 3. Network Binding (`main.py`)
|
||||
**Purpose**: Secure local network access without internet exposure.
|
||||
|
||||
**Implementation**:
|
||||
- `get_local_ip()` - Detects local network IP (192.x.x.x or 100.x.x.x)
|
||||
- Binds to specific local IP instead of 0.0.0.0
|
||||
- Falls back to localhost if not on private network
|
||||
|
||||
## ❌ What's Missing
|
||||
|
||||
### Critical Gap: No Integration
|
||||
**The federation system exists as standalone modules but is NOT connected to the main application flow.**
|
||||
|
||||
**Specific Issues**:
|
||||
|
||||
1. **No CLI Flag**: No `--federation` or `--enable-federation` argument in `main.py`
|
||||
|
||||
2. **Discovery Never Starts**:
|
||||
- `SwarmDiscovery` class is imported in `network/__init__.py`
|
||||
- But never instantiated or started in `main.py`
|
||||
- `start_advertising()` and `start_discovery()` are never called
|
||||
|
||||
3. **Federation Never Starts**:
|
||||
- `FederatedSwarm` class exists but is never instantiated
|
||||
- `main.py` calls `swarm.generate()` directly
|
||||
- Should call `federated_swarm.generate_with_federation()` when enabled
|
||||
|
||||
4. **API Routes Not Registered**:
|
||||
- Federation endpoints exist in `federation.py` but aren't added to FastAPI router
|
||||
- Routes in `src/api/routes.py` don't include `/v1/federation/*`
|
||||
|
||||
5. **No Peer Management UI**:
|
||||
- No way to see discovered peers
|
||||
- No status dashboard for federation
|
||||
- No manual peer configuration
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
src/network/
|
||||
├── __init__.py # Exports SwarmDiscovery, FederationClient, etc.
|
||||
├── discovery.py # mDNS/Bonjour discovery service
|
||||
│ ├── SwarmDiscovery # Main discovery class
|
||||
│ ├── PeerInfo # Peer information dataclass
|
||||
│ └── create_discovery_service() # Factory function
|
||||
├── federation.py # Inter-swarm communication
|
||||
│ ├── FederationClient # HTTP client for peers
|
||||
│ ├── FederatedSwarm # Wraps swarm with federation
|
||||
│ ├── PeerVote # Vote from peer
|
||||
│ └── FederationResult # Result of federated generation
|
||||
└── (routes missing) # Should add federation routes
|
||||
|
||||
main.py # Should integrate federation here
|
||||
└── Currently: Just runs local swarm
|
||||
└── Should: Optionally run federated swarm with discovery
|
||||
```
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
- Automatic discovery of peers on same local network
|
||||
- Distributed generation across multiple machines
|
||||
- Consensus voting between local and peer responses
|
||||
- Health checking and peer timeout handling
|
||||
- Secure local network binding (no internet exposure)
|
||||
|
||||
### Out of Scope (Future)
|
||||
- Internet-wide federation (would need authentication/encryption)
|
||||
- Cross-platform federation (Mac ↔ Linux ↔ Windows)
|
||||
- Peer authentication/authorization
|
||||
- Encrypted peer communication
|
||||
- WAN federation through NAT traversal
|
||||
- Peer reputation/scoring system
|
||||
|
||||
## TODO
|
||||
|
||||
### Phase 1: Basic Integration (Minimum Viable)
|
||||
1. **Add `--federation` CLI flag** to `main.py`
|
||||
- Add argument parser entry
|
||||
- Conditionally enable federation
|
||||
|
||||
2. **Integrate discovery in main flow**
|
||||
```python
|
||||
# In main.py after swarm initialization:
|
||||
if args.federation:
|
||||
discovery = await create_discovery_service(args.port)
|
||||
await discovery.start_advertising(swarm_info)
|
||||
await discovery.start_discovery()
|
||||
```
|
||||
|
||||
3. **Add federation API routes** to `src/api/routes.py`
|
||||
- `POST /v1/federation/vote`
|
||||
- `GET /v1/federation/health`
|
||||
- `GET /v1/federation/peers` (list discovered peers)
|
||||
|
||||
4. **Create FederatedSwarm wrapper**
|
||||
```python
|
||||
# Replace: result = await swarm.generate(...)
|
||||
# With:
|
||||
if args.federation:
|
||||
federated = FederatedSwarm(swarm, discovery)
|
||||
result = await federated.generate_with_federation(...)
|
||||
else:
|
||||
result = await swarm.generate(...)
|
||||
```
|
||||
|
||||
### Phase 2: Polish
|
||||
5. **Add peer status display**
|
||||
- Show discovered peers in startup banner
|
||||
- Display peer count in status
|
||||
- Log when peers join/leave
|
||||
|
||||
6. **Handle edge cases**
|
||||
- No peers available (fallback to local only)
|
||||
- All peers timeout (graceful degradation)
|
||||
- Split-brain scenarios
|
||||
|
||||
7. **Configuration**
|
||||
- Config file support for federation settings
|
||||
- Manual peer list (bypass discovery)
|
||||
- Federation strategy selection
|
||||
|
||||
### Phase 3: Testing
|
||||
8. **Integration tests**
|
||||
- Two instances on same machine
|
||||
- Two instances on same network
|
||||
- Peer timeout handling
|
||||
- Consensus validation
|
||||
|
||||
## Usage (When Complete)
|
||||
|
||||
### Start Federated Mode
|
||||
```bash
|
||||
# On Mac 1 (192.168.1.100)
|
||||
python main.py --auto --federation
|
||||
|
||||
# On Mac 2 (192.168.1.101)
|
||||
python main.py --auto --federation
|
||||
|
||||
# Both will:
|
||||
# 1. Start local API on 192.168.x.x:17615
|
||||
# 2. Advertise via mDNS
|
||||
# 3. Discover each other within 5-10 seconds
|
||||
# 4. Distribute generation requests between them
|
||||
```
|
||||
|
||||
### Expected Behavior
|
||||
1. Both Macs advertise themselves via mDNS
|
||||
2. Each discovers the other within 10 seconds
|
||||
3. When a request comes in, both generate responses
|
||||
4. Consensus algorithm picks best response
|
||||
5. Result returned to client
|
||||
|
||||
## Benefits When Complete
|
||||
- **More workers**: Combine instances across machines
|
||||
- **Better consensus**: More responses = better selection
|
||||
- **Load balancing**: Distribute generation across devices
|
||||
- **Redundancy**: If one fails, others continue
|
||||
- **Heterogeneous hardware**: Mix Macs, PCs, servers
|
||||
|
||||
## Current Workaround
|
||||
Until federation is integrated, you can:
|
||||
1. Run instances independently on different machines
|
||||
2. Point clients to specific instances manually
|
||||
3. No automatic peer discovery or coordination
|
||||
@@ -1,597 +1,191 @@
|
||||
# Local Swarm
|
||||
|
||||
Automatically configure and run a swarm of small coding LLMs optimized for your hardware. Provides an OpenAI-compatible API for seamless integration with opencode and other tools.
|
||||
Run a swarm of local LLMs on your hardware. Multiple models work together to give you the best answer through consensus voting.
|
||||
|
||||
## Features
|
||||
## What It Does
|
||||
|
||||
- **Interactive Menu System**: Easy-to-use menu for selecting model configurations, browsing options, or creating custom setups
|
||||
- **Hardware Auto-Detection**: Automatically detects your GPU (NVIDIA, AMD, Intel), Apple Silicon, Qualcomm (Android), or CPU and selects optimal settings
|
||||
- **Smart Model Selection**: Chooses the best model, quantization, and instance count based on available VRAM/RAM
|
||||
- **Startup Summary**: Clear display of detected hardware, selected model, resource usage, and worker status
|
||||
- **Swarm Consensus**: Multiple LLM instances vote on the best response for higher quality outputs
|
||||
- **Network Federation**: Multiple machines on the same network can join into a "federated swarm" for distributed consensus
|
||||
- **OpenAI-Compatible API**: Drop-in replacement for OpenAI API at `http://localhost:8000/v1`
|
||||
- **MCP Server**: Model Context Protocol support for tight AI assistant integration
|
||||
- **Cross-Platform**: Works on Windows, macOS, Linux, and Android (via Termux) with automatic backend selection
|
||||
|
||||
## Documentation
|
||||
|
||||
- **[Quick Start](#quick-start)** - Get up and running in minutes
|
||||
- **[Complete Guide](docs/GUIDE.md)** - Comprehensive documentation
|
||||
- Opencode configuration examples
|
||||
- API reference
|
||||
- Troubleshooting guide
|
||||
- Performance tuning
|
||||
- Advanced configuration
|
||||
- **[Configuration](#configuration)** - Customize your setup
|
||||
- **[Interactive Mode](#interactive-mode)** - Using the menu system
|
||||
- **[Tips & Help](#tips--help)** - Learn about models, quantization, and optimization
|
||||
- **Auto-detects your hardware** (NVIDIA, AMD, Intel, Apple Silicon, Qualcomm, or CPU)
|
||||
- **Downloads and runs multiple LLM instances** optimized for your VRAM/RAM
|
||||
- **Uses consensus voting** - all instances answer, best response wins
|
||||
- **Connects multiple machines** on your network for a "hive mind" effect
|
||||
- **Provides an OpenAI-compatible API** at `http://localhost:17615/v1`
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Installation
|
||||
|
||||
#### Windows (PowerShell)
|
||||
```powershell
|
||||
# Clone the repository
|
||||
```bash
|
||||
# Clone and install
|
||||
git clone https://github.com/yourusername/local_swarm.git
|
||||
cd local_swarm
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Run installer
|
||||
.\scripts\install.bat
|
||||
```
|
||||
|
||||
#### macOS/Linux
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://github.com/yourusername/local_swarm.git
|
||||
cd local_swarm
|
||||
|
||||
# Run installer
|
||||
chmod +x scripts/install.sh
|
||||
./scripts/install.sh
|
||||
```
|
||||
|
||||
#### Android (Termux)
|
||||
```bash
|
||||
# In Termux app
|
||||
git clone https://github.com/yourusername/local_swarm.git
|
||||
cd local_swarm
|
||||
|
||||
# Run Termux installer
|
||||
chmod +x scripts/install-termux.sh
|
||||
./scripts/install-termux.sh
|
||||
```
|
||||
|
||||
**Note**: Android support is limited to small models (1-3B) due to memory constraints. Requires 8GB+ RAM.
|
||||
|
||||
### Usage
|
||||
|
||||
#### Start the Swarm
|
||||
```bash
|
||||
# Auto-detect hardware and start
|
||||
python -m local_swarm
|
||||
|
||||
# Or use the CLI
|
||||
# Run it
|
||||
python main.py
|
||||
```
|
||||
|
||||
On first run, the tool will:
|
||||
1. Scan your hardware (GPU, RAM, CPU)
|
||||
2. Select the optimal model and quantization
|
||||
On first run, it will:
|
||||
1. Detect your hardware
|
||||
2. Pick the best model and quantization
|
||||
3. Download the model (one-time)
|
||||
4. Start multiple instances based on available memory
|
||||
5. Expose the API at `http://localhost:8000`
|
||||
4. Start multiple LLM workers
|
||||
5. Expose the API at `http://localhost:17615`
|
||||
|
||||
Example startup output:
|
||||
```
|
||||
🔍 Detecting hardware...
|
||||
OS: Windows 11
|
||||
GPU: NVIDIA GeForce RTX 4060 Ti (16 GB VRAM)
|
||||
CPU: 16 cores
|
||||
RAM: 32 GB
|
||||
## Usage
|
||||
|
||||
📊 Optimal configuration:
|
||||
Model: Qwen 2.5 Coder 3B
|
||||
Quantization: Q4_K_M (1.8 GB per instance)
|
||||
Instances: 8 (using 14.4 GB VRAM)
|
||||
|
||||
⬇️ Downloading model...
|
||||
Progress: 100% ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 1.8/1.8 GB
|
||||
|
||||
🚀 Starting swarm...
|
||||
Worker 1: Ready (GPU:0)
|
||||
Worker 2: Ready (GPU:0)
|
||||
...
|
||||
Worker 8: Ready (GPU:0)
|
||||
|
||||
✅ Local Swarm is running!
|
||||
API: http://localhost:8000/v1
|
||||
Models: http://localhost:8000/v1/models
|
||||
Health: http://localhost:8000/health
|
||||
|
||||
💡 Configure opencode to use:
|
||||
base_url: http://localhost:8000/v1
|
||||
api_key: any (not used)
|
||||
### Interactive Mode (default)
|
||||
```bash
|
||||
python main.py
|
||||
```
|
||||
|
||||
#### Configure opencode
|
||||
Shows a menu with:
|
||||
- Recommended configuration (auto-selected)
|
||||
- Browse all compatible models
|
||||
- Custom configuration wizard
|
||||
|
||||
Add to your opencode configuration:
|
||||
### Auto Mode (no menu)
|
||||
```bash
|
||||
python main.py --auto
|
||||
```
|
||||
|
||||
### With Other Options
|
||||
```bash
|
||||
python main.py --model qwen:3b:q4 # Use specific model
|
||||
python main.py --instances 4 # Force 4 workers
|
||||
python main.py --port 8080 # Custom port
|
||||
python main.py --detect # Show hardware info only
|
||||
python main.py --federation # Enable network federation
|
||||
python main.py --mcp # Enable MCP server
|
||||
```
|
||||
|
||||
## Connect to Opencode
|
||||
|
||||
Add to your opencode config:
|
||||
|
||||
```json
|
||||
{
|
||||
"model": {
|
||||
"provider": "openai",
|
||||
"base_url": "http://localhost:8000/v1",
|
||||
"base_url": "http://localhost:17615/v1",
|
||||
"api_key": "not-needed",
|
||||
"model": "local-swarm"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### MCP Server (Optional)
|
||||
## Network Federation (Hive Mind)
|
||||
|
||||
For tighter integration with AI assistants, enable the MCP server:
|
||||
Run on multiple machines to combine their power:
|
||||
|
||||
```bash
|
||||
python main.py --mcp
|
||||
# Machine 1 (Windows with RTX 4060)
|
||||
python main.py --auto --federation
|
||||
|
||||
# Machine 2 (Mac Mini M1)
|
||||
python main.py --auto --federation
|
||||
|
||||
# Machine 3 (Old laptop)
|
||||
python main.py --auto --federation
|
||||
```
|
||||
|
||||
This runs alongside the HTTP API and exposes tools AI assistants can use:
|
||||
- `get_hardware_info` - Query CPU, GPU, and RAM
|
||||
- `get_swarm_status` - Check worker health
|
||||
- `generate_code` - Generate code with consensus
|
||||
- `list_available_models` - See what models can run
|
||||
- `get_worker_details` - Get detailed worker statistics
|
||||
Machines auto-discover each other and vote together on every request.
|
||||
|
||||
MCP allows AI assistants to automatically query your hardware capabilities and select appropriate models.
|
||||
## How Consensus Works
|
||||
|
||||
1. Your prompt goes to all LLM instances
|
||||
2. Each instance generates a response independently
|
||||
3. The consensus algorithm picks the best answer:
|
||||
- **Similarity** (default): Groups responses by meaning, picks the largest group
|
||||
- **Quality**: Scores on completeness, code blocks, structure
|
||||
- **Fastest**: Returns the quickest response
|
||||
- **Majority**: Simple text match voting
|
||||
|
||||
## Configuration
|
||||
|
||||
Create a `config.yaml` file for customization:
|
||||
Create `config.yaml`:
|
||||
|
||||
```yaml
|
||||
server:
|
||||
host: "127.0.0.1"
|
||||
port: 8000
|
||||
port: 17615
|
||||
|
||||
swarm:
|
||||
consensus_strategy: "similarity" # similarity, quality, fastest
|
||||
consensus_strategy: "similarity" # similarity, quality, fastest, majority
|
||||
min_instances: 2
|
||||
max_instances: 8
|
||||
|
||||
hardware:
|
||||
gpu_memory_fraction: 1.0 # Use 100% of GPU VRAM
|
||||
ram_fraction: 0.5 # Use 50% of system RAM for CPU/Apple Silicon
|
||||
|
||||
federation:
|
||||
enabled: true
|
||||
discovery_port: 8765
|
||||
federation_port: 8766
|
||||
max_peers: 10
|
||||
|
||||
models:
|
||||
cache_dir: "~/.local_swarm/models"
|
||||
```
|
||||
|
||||
## CLI Options
|
||||
## Supported Hardware
|
||||
|
||||
```bash
|
||||
# Show hardware detection without starting
|
||||
python -m local_swarm --detect
|
||||
|
||||
# Use specific model
|
||||
python -m local_swarm --model qwen2.5-coder:3b:q4
|
||||
|
||||
# Use specific port
|
||||
python -m local_swarm --port 8080
|
||||
|
||||
# Force number of instances
|
||||
python -m local_swarm --instances 4
|
||||
|
||||
# Download models only (no server)
|
||||
python -m local_swarm --download-only
|
||||
|
||||
# Enable MCP server alongside HTTP API
|
||||
python -m local_swarm --mcp
|
||||
|
||||
# Show help
|
||||
python -m local_swarm --help
|
||||
|
||||
# Auto-detect without interactive menu
|
||||
python -m local_swarm --auto
|
||||
```
|
||||
|
||||
## Interactive Mode
|
||||
|
||||
By default, Local Swarm starts in **interactive mode** with a menu system:
|
||||
|
||||
```
|
||||
======================================================================
|
||||
Local Swarm - Model Selection
|
||||
======================================================================
|
||||
|
||||
----------------------------------------------------------------------
|
||||
Hardware Detection
|
||||
----------------------------------------------------------------------
|
||||
Operating System: Darwin
|
||||
CPU: 12 cores
|
||||
System RAM: 24.0 GB
|
||||
Available RAM: 6.2 GB
|
||||
|
||||
GPU Detected:
|
||||
Name: Apple Silicon GPU
|
||||
Type: Apple Silicon (Unified Memory)
|
||||
Total Memory: 24.0 GB
|
||||
|
||||
Available for LLMs: 12.0 GB
|
||||
(Using 50% of system RAM)
|
||||
|
||||
----------------------------------------------------------------------
|
||||
Configuration Options
|
||||
----------------------------------------------------------------------
|
||||
|
||||
💡 Recommended: Qwen 2.5 Coder 7b (q6_k)
|
||||
Instances: 2
|
||||
Memory: 12.0 GB
|
||||
|
||||
[1] Recommended Configuration - Qwen 2.5 Coder 7b (q6_k) with 2 instances
|
||||
[2] Browse All Configurations - See all models that fit your hardware
|
||||
[3] Custom Configuration - Specify exact model and number of instances
|
||||
|
||||
Enter your choice:
|
||||
```
|
||||
|
||||
### Menu Options
|
||||
|
||||
1. **Recommended Configuration** - Automatically selects the best model and instance count for your hardware
|
||||
2. **Browse All Configurations** - Shows all feasible models that fit in your available memory
|
||||
3. **Custom Configuration** - Step-by-step wizard to select:
|
||||
- Model family (Qwen, DeepSeek, CodeLlama)
|
||||
- Model size (3B, 7B, 14B)
|
||||
- Quantization level (Q4, Q5, Q6)
|
||||
- Number of instances (1 to max supported)
|
||||
|
||||
To skip the menu and use auto-detection, use `--auto` flag.
|
||||
|
||||
## Startup Summary
|
||||
|
||||
When starting, Local Swarm displays a comprehensive summary:
|
||||
|
||||
```
|
||||
======================================================================
|
||||
Local Swarm - Startup Summary
|
||||
======================================================================
|
||||
|
||||
----------------------------------------------------------------------
|
||||
Hardware Detection
|
||||
----------------------------------------------------------------------
|
||||
Operating System: Darwin
|
||||
CPU: 12 cores
|
||||
System RAM: 24.0 GB
|
||||
Available RAM: 6.2 GB
|
||||
|
||||
GPU Detected:
|
||||
Name: Apple Silicon GPU
|
||||
Type: Apple Silicon (Unified Memory)
|
||||
Total Memory: 24.0 GB
|
||||
|
||||
Available for LLMs: 12.0 GB
|
||||
|
||||
----------------------------------------------------------------------
|
||||
Model Configuration
|
||||
----------------------------------------------------------------------
|
||||
Model: Qwen 2.5 Coder 7b (q6_k)
|
||||
Description: Alibaba's code-focused model
|
||||
Instances: 2
|
||||
Memory per Instance: 6.0 GB
|
||||
Total Memory: 12.0 GB
|
||||
Utilization: 100.0% of available
|
||||
|
||||
======================================================================
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
### Hardware Detection
|
||||
|
||||
The tool automatically detects your system:
|
||||
- **Windows**: NVIDIA (NVML), AMD (ROCm), Intel (OneAPI)
|
||||
- **macOS**: Apple Silicon via Metal, unified memory model
|
||||
- **Linux**: NVIDIA (NVML), AMD (ROCm), Intel (OneAPI/OpenCL)
|
||||
- **Android**: Qualcomm Adreno GPUs (via Termux)
|
||||
|
||||
**Supported Backends**:
|
||||
- **NVIDIA**: CUDA via llama.cpp
|
||||
- **AMD**: ROCm via llama.cpp (Linux, Windows experimental)
|
||||
- **Intel**: OneAPI/SYCL via llama.cpp
|
||||
- **Apple Silicon**: Metal via MLX
|
||||
- **Qualcomm**: CPU fallback on llama.cpp (Android/Termux)
|
||||
|
||||
### Model Selection
|
||||
|
||||
Based on available memory:
|
||||
1. **External GPU**: Use 100% of VRAM minus OS overhead
|
||||
2. **Apple Silicon**: Use 50% of unified RAM
|
||||
3. **CPU-only**: Use 50% of system RAM
|
||||
|
||||
The algorithm selects:
|
||||
- Largest model size that fits
|
||||
- Highest quantization quality possible
|
||||
- Maximum instances (2-8) based on memory
|
||||
|
||||
Example configurations:
|
||||
|
||||
| Hardware | Model | Quant | Instances | Memory Used |
|
||||
|----------|-------|-------|-----------|-------------|
|
||||
| RTX 4090 24GB | Qwen 2.5 14B | Q4_K_M | 2 | ~17.6 GB |
|
||||
| RTX 4060 Ti 16GB | Qwen 2.5 7B | Q4_K_M | 3 | ~13.5 GB |
|
||||
| RTX 4060 Ti 8GB | Qwen 2.5 3B | Q6_K | 4 | ~10.4 GB |
|
||||
| RX 7900 XTX 24GB | Qwen 2.5 14B | Q4_K_M | 2 | ~17.6 GB |
|
||||
| Arc A770 16GB | Qwen 2.5 7B | Q5_K_M | 2 | ~10.4 GB |
|
||||
| M4 Max 64GB | Qwen 2.5 14B | Q4_K_M | 4 | ~35.2 GB |
|
||||
| M3 Pro 36GB | Qwen 2.5 7B | Q4_K_M | 4 | ~18 GB |
|
||||
| M1 8GB | Qwen 2.5 3B | Q4_K_M | 2 | ~3.6 GB |
|
||||
| Snapdragon 8 Gen 3 | Qwen 2.5 3B | Q4_K_M | 1 | ~1.8 GB |
|
||||
| CPU 32GB | Qwen 2.5 3B | Q4_K_M | 8 | ~14.4 GB |
|
||||
| **Federated (3 machines)** | **Qwen 2.5 7B** | **Q4_K_M** | **9** | **~40.5 GB** |
|
||||
|
||||
### Swarm Consensus
|
||||
|
||||
For each request, the swarm:
|
||||
1. Sends the prompt to all running instances
|
||||
2. Collects responses in parallel
|
||||
3. Runs consensus algorithm:
|
||||
- **Similarity**: Groups responses by semantic similarity, returns largest group
|
||||
- **Quality**: Scores responses on completeness and code quality
|
||||
- **Fastest**: Returns the quickest response
|
||||
4. Returns the winning response via OpenAI-compatible API
|
||||
|
||||
### Network Federation
|
||||
|
||||
Run Local Swarm on multiple machines in the same network to create a "federated swarm":
|
||||
|
||||
**Example Setup**:
|
||||
- Windows PC (RTX 4060 Ti): 4 instances
|
||||
- Mac Mini (M1): 2 instances
|
||||
- MacBook (M4): 3 instances
|
||||
- Total: 9 instances voting on every request
|
||||
|
||||
**How it works**:
|
||||
1. Each machine auto-discovers others via mDNS/Bonjour
|
||||
2. Each swarm generates responses independently
|
||||
3. Local consensus picks best response per machine
|
||||
4. Cross-swarm consensus votes across all machines
|
||||
5. Best response returned to client
|
||||
|
||||
**To enable federation**:
|
||||
```yaml
|
||||
federation:
|
||||
enabled: true
|
||||
discovery_port: 8765 # mDNS/Bonjour discovery
|
||||
federation_port: 8766 # Inter-swarm communication
|
||||
```
|
||||
|
||||
Machines will automatically discover each other within 10 seconds.
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### GET /v1/models
|
||||
List available models
|
||||
|
||||
### POST /v1/chat/completions
|
||||
Chat completion with consensus
|
||||
|
||||
**Request**:
|
||||
```json
|
||||
{
|
||||
"model": "local-swarm",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Write a Python function to sort a list"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"id": "chatcmpl-abc123",
|
||||
"object": "chat.completion",
|
||||
"created": 1234567890,
|
||||
"model": "local-swarm",
|
||||
"choices": [{
|
||||
"index": 0,
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": "def sort_list(lst):\n return sorted(lst)"
|
||||
},
|
||||
"finish_reason": "stop"
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
### GET /health
|
||||
Health check
|
||||
|
||||
### GET /metrics
|
||||
Prometheus metrics (optional)
|
||||
| Hardware | Backend | Notes |
|
||||
|----------|---------|-------|
|
||||
| NVIDIA GPU | llama.cpp (CUDA) | Best performance |
|
||||
| AMD GPU | llama.cpp (ROCm) | Linux/Windows |
|
||||
| Intel GPU | llama.cpp (SYCL) | Linux/Windows |
|
||||
| Apple Silicon | MLX | Native Metal |
|
||||
| Qualcomm | llama.cpp (CPU) | Android/Termux |
|
||||
| CPU-only | llama.cpp | Slower but works |
|
||||
|
||||
## Supported Models
|
||||
|
||||
Currently supported models (auto-selected based on hardware):
|
||||
- **Qwen 2.5 Coder** (3B, 7B, 14B) - Recommended
|
||||
- **DeepSeek Coder** (1.3B, 6.7B, 33B)
|
||||
- **CodeLlama** (7B, 13B, 34B)
|
||||
|
||||
- **Qwen 2.5 Coder** (3B, 7B, 14B) - Recommended for coding tasks
|
||||
- **DeepSeek Coder** (1.3B, 6.7B, 33B) - Good alternative
|
||||
- **CodeLlama** (7B, 13B, 34B) - Meta's code model
|
||||
All support GGUF quantization (Q4_K_M recommended).
|
||||
|
||||
All models support GGUF quantization:
|
||||
- Q4_K_M - Good quality, smallest size (recommended)
|
||||
- Q5_K_M - Better quality
|
||||
- Q6_K - Best quality
|
||||
## API Endpoints
|
||||
|
||||
- `GET /v1/models` - List available models
|
||||
- `POST /v1/chat/completions` - Chat completion with consensus
|
||||
- `GET /health` - Health check
|
||||
- `GET /v1/federation/peers` - List discovered peers (when federation enabled)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Out of Memory
|
||||
If you get OOM errors:
|
||||
```bash
|
||||
# Reduce instances
|
||||
python -m local_swarm --instances 2
|
||||
|
||||
# Or use smaller model
|
||||
python -m local_swarm --model qwen2.5-coder:3b:q4
|
||||
python main.py --instances 2 # Reduce workers
|
||||
python main.py --model qwen:3b:q4 # Use smaller model
|
||||
```
|
||||
|
||||
### Slow Performance
|
||||
- Check GPU utilization with `nvidia-smi` (NVIDIA) or Activity Monitor (macOS)
|
||||
- Ensure model is cached (first run downloads to `~/.local_swarm/models`)
|
||||
- Try reducing instances to avoid contention
|
||||
- Check GPU utilization with `nvidia-smi`
|
||||
- Reduce instances to avoid contention
|
||||
- Use Q4 quantization instead of Q6
|
||||
|
||||
### Windows: CUDA not detected
|
||||
Make sure NVIDIA drivers are installed:
|
||||
### CUDA Not Detected (Windows)
|
||||
```powershell
|
||||
nvidia-smi
|
||||
nvidia-smi # Check drivers
|
||||
pip uninstall llama-cpp-python
|
||||
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
|
||||
```
|
||||
If this fails, reinstall drivers from nvidia.com
|
||||
|
||||
### macOS: MLX not found
|
||||
### macOS: MLX Not Found
|
||||
```bash
|
||||
pip install mlx-lm
|
||||
```
|
||||
|
||||
### Linux: AMD GPU not detected
|
||||
Ensure ROCm is installed:
|
||||
```bash
|
||||
rocm-smi
|
||||
```
|
||||
If not found, install from https://www.amd.com/en/developer/rocm-hub.html
|
||||
|
||||
### Linux: Intel GPU not detected
|
||||
Install Intel oneAPI:
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | sudo gpg --dearmor -o /usr/share/keyrings/intel-oneapi-archive-keyring.gpg
|
||||
echo "deb [signed-by=/usr/share/keyrings/intel-oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
|
||||
sudo apt update
|
||||
sudo apt install intel-basekit
|
||||
```
|
||||
|
||||
### Android: Termux issues
|
||||
- Ensure Termux is installed from F-Droid (not Play Store)
|
||||
- Run `pkg update` before installation
|
||||
- Limited to small models (1-3B) due to RAM constraints
|
||||
- Use CPU backend only (no GPU acceleration on Android yet)
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.9+
|
||||
- 4GB+ RAM (8GB+ recommended)
|
||||
- Optional: NVIDIA/AMD/Intel GPU with 4GB+ VRAM
|
||||
- Optional: Apple Silicon Mac
|
||||
- Optional: Android device with 8GB+ RAM (via Termux)
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
# Install dev dependencies
|
||||
pip install -r requirements-dev.txt
|
||||
|
||||
# Run tests
|
||||
pytest
|
||||
|
||||
# Run specific platform tests
|
||||
pytest tests/test_hardware.py -v
|
||||
|
||||
# Format code
|
||||
black src/
|
||||
ruff check src/
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Single Machine
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ OpenAI API Client │
|
||||
│ (opencode, etc.) │
|
||||
└─────────────┬───────────────────────┘
|
||||
│ HTTP
|
||||
▼
|
||||
┌─────────────────────────────────────┐
|
||||
│ Local Swarm API Server │
|
||||
│ (FastAPI / localhost:8000) │
|
||||
└─────────────┬───────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────┐
|
||||
│ Swarm Manager │
|
||||
│ ┌─────────┐ ┌─────────┐ │
|
||||
│ │ Worker 1│ │ Worker 2│ ... │
|
||||
│ │(LLM #1) │ │(LLM #2) │ │
|
||||
│ └────┬────┘ └────┬────┘ │
|
||||
│ │ │ │
|
||||
│ └─────┬─────┘ │
|
||||
│ ▼ │
|
||||
│ Consensus Engine │
|
||||
└─────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────┐
|
||||
│ Backend (llama.cpp / MLX) │
|
||||
│ ┌─────────────────────┐ │
|
||||
│ │ GGUF/MLX Model │ │
|
||||
│ │ (Qwen/Codellama) │ │
|
||||
│ └─────────────────────┘ │
|
||||
└─────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────┐
|
||||
│ Hardware (GPU/CPU/Apple Silicon) │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
local_swarm/
|
||||
├── main.py # CLI entry point
|
||||
├── src/
|
||||
│ ├── hardware/ # GPU detection (NVIDIA, AMD, Intel, Apple, Qualcomm)
|
||||
│ ├── models/ # Model registry, selection, downloading
|
||||
│ ├── backends/ # llama.cpp and MLX backends
|
||||
│ ├── swarm/ # Worker management and consensus
|
||||
│ ├── network/ # Federation and peer discovery
|
||||
│ ├── api/ # OpenAI-compatible API server
|
||||
│ └── tools/ # Tool execution (read, write, bash)
|
||||
└── docs/ # Documentation
|
||||
|
||||
### Federated Swarm (Multiple Machines)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Local Network │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Windows PC │ │ Mac Mini │ │ MacBook │ │
|
||||
│ │ (RTX 4060) │ │ (M1) │ │ (M4) │ │
|
||||
│ │ 4 instances │ │ 2 instances │ │ 3 instances │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ │ │ │ │
|
||||
│ │ │ │ │
|
||||
│ └───────────────────┼───────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌────────┴────────┐ │
|
||||
│ │ Cross-Swarm │ │
|
||||
│ │ Consensus │ │
|
||||
│ └────────┬────────┘ │
|
||||
│ │ │
|
||||
│ ┌────────▼────────┐ │
|
||||
│ │ opencode │ │
|
||||
│ └─────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
MIT License - See LICENSE file
|
||||
|
||||
## Contributing
|
||||
|
||||
Contributions welcome! Please read CONTRIBUTING.md first.
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
- [llama.cpp](https://github.com/ggerganov/llama.cpp) - Inference engine (CUDA/ROCm/SYCL)
|
||||
- [MLX](https://github.com/ml-explore/mlx) - Apple Silicon backend
|
||||
- [Qwen](https://github.com/QwenLM/Qwen) - Model family
|
||||
- [DeepSeek](https://github.com/deepseek-ai/deepseek-coder) - Model family
|
||||
- [HuggingFace](https://huggingface.co) - Model hosting
|
||||
- [ROCm](https://github.com/RadeonOpenCompute/ROCm) - AMD GPU support
|
||||
- [oneAPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/overview.html) - Intel GPU support
|
||||
- [Termux](https://termux.dev) - Android terminal emulator
|
||||
MIT License
|
||||
|
||||
@@ -1,106 +0,0 @@
|
||||
Here's a comprehensive review of your project. It's well-architected overall, but there are several issues worth addressing.
|
||||
|
||||
---
|
||||
Critical Bugs
|
||||
|
||||
1. src/network/discovery.py:128 — asyncio.create_task() called from a non-asyncio thread
|
||||
Zeroconf's ServiceBrowser invokes _on_service_state_change from a background thread, but asyncio.create_task() requires the asyncio event loop in
|
||||
the current thread. This will crash with RuntimeError: no current event loop. Use asyncio.run_coroutine_threadsafe(coro, loop) instead.
|
||||
|
||||
2. src/network/discovery.py:161 — int() on bytes raises TypeError
|
||||
int(properties.get(b"instances", b"0")) — in Python 3, int(b"0") is a TypeError. Need .decode() first.
|
||||
|
||||
3. src/hardware/detector.py:149,174 — Android/Qualcomm detection is unreachable
|
||||
platform.system() returns "Linux" on Android, not "android". So the code enters the Linux branch, tries NVIDIA/AMD/Intel, fails, and returns None —
|
||||
never reaching Qualcomm detection.
|
||||
|
||||
4. src/api/routes.py:77 — response_model breaks streaming
|
||||
The route declares response_model=ChatCompletionResponse, but when request.stream=True, it returns a StreamingResponse. FastAPI will try to
|
||||
validate the streaming response against the Pydantic model and fail.
|
||||
|
||||
---
|
||||
High Severity
|
||||
|
||||
5. src/backends/llamacpp.py:85-94 and src/backends/mlx.py:88-96 — Blocking calls in async methods
|
||||
Both backends call synchronous inference (self._llm(...), mlx_generate(...)) directly inside async def methods. This blocks the entire event loop,
|
||||
freezing the API server during inference. Wrap in await asyncio.to_thread(...).
|
||||
|
||||
6. src/backends/llamacpp.py:29 — Lock declared but never initialized
|
||||
self._lock = None is never replaced with an actual asyncio.Lock(), so there's no concurrency protection when multiple requests hit the same backend
|
||||
instance.
|
||||
|
||||
7. src/swarm/consensus.py:85,89 — Blocking I/O in async context
|
||||
SentenceTransformer('all-MiniLM-L6-v2') downloads/loads a model synchronously, and .encode() is CPU-bound. Both freeze the event loop.
|
||||
|
||||
8. src/hardware/amd.py:80 — VRAM regex matches wrong number
|
||||
re.search(r'(\d+)', line) on a line like GPU[0] : VRAM Total Memory (B): 17179869184 matches 0 (from GPU[0]), not the VRAM value.
|
||||
|
||||
9. src/models/downloader.py:79-88 — Partial downloads cached as valid
|
||||
If a download is interrupted, the partial file remains. is_model_cached() sees size > 0 and treats it as valid. Should download to a .tmp file and
|
||||
rename atomically on completion.
|
||||
|
||||
10. src/network/federation.py:253-277 — best_of_n strategy is non-functional
|
||||
The code creates GenerationResponse objects but never uses them, then just returns the local response. This strategy is dead code.
|
||||
|
||||
---
|
||||
Medium Severity
|
||||
|
||||
11. src/models/selector.py:182-184 — Memory calculation uses wrong instance count
|
||||
total_memory_gb = smallest_quant.vram_gb * instances uses the pre-clamped value, but instances gets max(instances, 1) on the next line. Data
|
||||
inconsistency.
|
||||
|
||||
12. src/models/selector.py:65 — calculate_max_instances returns infeasible count
|
||||
Returns MIN_INSTANCES (2) even when only 0-1 instances fit in memory. _try_smallest_variant calls this without the memory guard that _try_model
|
||||
has.
|
||||
|
||||
13. src/hardware/detector.py:87-88 — NVML resource leak
|
||||
pynvml.nvmlInit() is called but nvmlShutdown() is never called. Need a try/finally.
|
||||
|
||||
14. src/api/server.py:60-66 — Invalid CORS configuration
|
||||
allow_origins=["*"] with allow_credentials=True violates the CORS spec. Browsers will reject this.
|
||||
|
||||
15. src/swarm/consensus.py:186-199 — _majority_vote doesn't do majority voting
|
||||
It picks the median-length response, not the most common one. Name and docstring are misleading.
|
||||
|
||||
16. src/interactive.py:226,368,458 — Recursive menu navigation risks stack overflow
|
||||
Menu functions call each other recursively. Repeated back-and-forth navigation can blow the stack. Use a loop-based state machine instead.
|
||||
|
||||
17. Multiple files — Bare except: clauses
|
||||
llamacpp.py:157,187, mlx.py:141, detector.py:108,190, amd.py:214, intel.py:220,248, qualcomm.py:185, discovery.py:236, federation.py:116,
|
||||
updater.py:141,218,231 — all catch SystemExit and KeyboardInterrupt. Use except Exception: instead.
|
||||
|
||||
---
|
||||
Low Severity / Code Quality
|
||||
|
||||
18. src/api/routes.py:112,133,147 — .json() deprecated in Pydantic v2. Use .model_dump_json().
|
||||
|
||||
19. src/backends/mlx.py:59-63 — GGUF loading via MLX is suspect. Passing the parent directory of a GGUF file to mlx_lm.load() likely won't work.
|
||||
|
||||
20. src/swarm/consensus.py:233 — False-positive list detection. Checks for -, *, 1., 2. which match hyphens in code, multiplication operators,
|
||||
version numbers, etc.
|
||||
|
||||
21. src/network/discovery.py:56 — Dict[str, any] should be Dict[str, Any] (capital A).
|
||||
|
||||
22. src/mcp_server.py:15-18 — Unused imports (ImageContent, Resource, EmbeddedResource, LoggingLevel).
|
||||
|
||||
23. src/models/downloader.py:74,118 — timeout=30 is connect-only, no read timeout. Multi-GB downloads can hang on stalled reads.
|
||||
|
||||
24. src/models/downloader.py — No checksum verification after download. Corrupted files are silently cached.
|
||||
|
||||
25. Tests directory is empty — tests/__init__.py exists but no actual tests.
|
||||
|
||||
---
|
||||
Suggested Improvements
|
||||
|
||||
1. Wrap all blocking inference in asyncio.to_thread() — this is the single most impactful fix. Without it, the API server can only handle one
|
||||
request at a time.
|
||||
2. Atomic downloads — download to .part file, rename on success, verify checksum against HuggingFace metadata.
|
||||
3. Replace recursive menus with a loop-based state machine — e.g. state = "main" in a while True loop with if state == "main": ... branches.
|
||||
4. Add proper logging — replace all print() calls with logging.getLogger(__name__). The codebase uses print() everywhere, making it hard to control
|
||||
verbosity.
|
||||
5. Fix the Android detection path — check is_termux() or /system/build.prop existence early in detect_gpu() before the platform branching.
|
||||
6. Add integration tests — even simple smoke tests (hardware detection returns valid data, model selection picks something reasonable, API server
|
||||
starts and responds to /health) would catch regressions.
|
||||
7. Use aiohttp.ClientSession as async context manager in federation to ensure proper cleanup.
|
||||
8. Consider separating streaming and non-streaming API routes — this avoids the response_model conflict and makes the code clearer.
|
||||
|
||||
@@ -1,134 +0,0 @@
|
||||
# Local Swarm TODO / Future Enhancements
|
||||
|
||||
## Context Window Optimization (For Long Context 30K+)
|
||||
|
||||
Based on docs/CONTEXT.md, implement context compression for memory-constrained setups:
|
||||
|
||||
### Option 2: Context Compression (Recommended for 16GB VRAM)
|
||||
|
||||
**Stage 1: Compression Swarm (3-5 workers)**
|
||||
- Split 60K input into 6x 10K chunks
|
||||
- Each worker summarizes one chunk
|
||||
- Aggregate summaries into 8K compressed context
|
||||
- Added latency: ~2-3 seconds
|
||||
|
||||
**Stage 2: Solution Swarm (N workers)**
|
||||
- Each worker gets 8K compressed + 2K relevant original
|
||||
- Generate solutions independently
|
||||
- Vote on best response
|
||||
|
||||
**Benefits:**
|
||||
- Works with standard 8K models
|
||||
- Maintains swarm consensus architecture
|
||||
- 2-3x more workers possible
|
||||
|
||||
**Implementation:**
|
||||
```python
|
||||
# New: CompressionEngine class
|
||||
class CompressionEngine:
|
||||
def compress(self, text: str, target_tokens: int) -> str:
|
||||
# Split into chunks
|
||||
# Parallel summarization
|
||||
# Aggregate results
|
||||
pass
|
||||
```
|
||||
|
||||
### Option 3: Hierarchical RAG (For 100K+ contexts)
|
||||
|
||||
**Tier 1: Indexing**
|
||||
- Embed context into vector database
|
||||
- Build searchable knowledge graph
|
||||
|
||||
**Tier 2: Retrieval + Generation**
|
||||
- Query index for relevant context
|
||||
- Each worker gets ~6K retrieved + 2K raw
|
||||
|
||||
**Tier 3: Voting**
|
||||
- Rerank and consensus
|
||||
|
||||
**Use case:** Codebase-wide analysis, large document processing
|
||||
|
||||
---
|
||||
|
||||
## Tool Execution Enhancements
|
||||
|
||||
### Streaming Tool Results
|
||||
- Stream long file reads progressively
|
||||
- Show bash command output in real-time
|
||||
- Progress indicators for large operations
|
||||
|
||||
### Tool Permissions
|
||||
- Configurable permission levels per tool
|
||||
- Approval required for destructive operations (rm, overwrite)
|
||||
- Audit log of all tool executions
|
||||
|
||||
### Tool Result Caching
|
||||
- Cache file reads (hash-based)
|
||||
- Invalidate on file modification
|
||||
- Reduce redundant disk I/O
|
||||
|
||||
---
|
||||
|
||||
## Federation Improvements
|
||||
|
||||
### Automatic Peer Discovery
|
||||
- Better mDNS reliability
|
||||
- Fallback to broadcast/multicast
|
||||
- Manual peer list persistence
|
||||
|
||||
### Load Balancing
|
||||
- Distribute requests across peers based on:
|
||||
- Current load (active workers)
|
||||
- Latency (response time)
|
||||
- Capability (model quality)
|
||||
|
||||
### Fault Tolerance
|
||||
- Automatic peer failover
|
||||
- Retry with different peers
|
||||
- Degraded mode (fewer voters)
|
||||
|
||||
---
|
||||
|
||||
## UI/UX Enhancements
|
||||
|
||||
### Web Dashboard
|
||||
- Real-time worker status visualization
|
||||
- Generation progress bars
|
||||
- Tool execution log viewer
|
||||
- Configuration management UI
|
||||
|
||||
### Better Error Messages
|
||||
- Clear explanations of OOM errors
|
||||
- Suggested configurations based on hardware
|
||||
- Model compatibility checker
|
||||
|
||||
---
|
||||
|
||||
## Performance Optimizations
|
||||
|
||||
### Speculative Decoding
|
||||
- Small draft model generates tokens
|
||||
- Large model verifies (2-3x speedup)
|
||||
- Requires draft model download
|
||||
|
||||
### KV Cache Optimization
|
||||
- PagedAttention (vLLM-style)
|
||||
- Memory-efficient attention states
|
||||
- Better long-context performance
|
||||
|
||||
### Model Quantization
|
||||
- Support for GPTQ/AWQ quantization
|
||||
- 2-3x smaller models with minimal quality loss
|
||||
- Enable larger models on same hardware
|
||||
|
||||
---
|
||||
|
||||
## Completed ✓
|
||||
|
||||
- [x] Tool execution architecture (local + remote)
|
||||
- [x] Simplified tool instructions (300 tokens vs 40k)
|
||||
- [x] Federation with peer discovery
|
||||
- [x] Hardware auto-detection
|
||||
- [x] MLX backend for Apple Silicon
|
||||
- [x] Consensus voting strategies
|
||||
- [x] Model auto-selection based on VRAM
|
||||
@@ -0,0 +1,12 @@
|
||||
Use tools to execute commands and fetch information. Output only tool calls.
|
||||
|
||||
Format:
|
||||
TOOL: bash
|
||||
ARGUMENTS: {"command": "ls -la", "description": "Lists files in directory"}
|
||||
|
||||
TOOL: webfetch
|
||||
ARGUMENTS: {"url": "https://example.com", "format": "markdown"}
|
||||
|
||||
Available tools: bash, webfetch
|
||||
|
||||
No explanations. No numbered lists. No markdown. Only tool calls.
|
||||
@@ -0,0 +1,115 @@
|
||||
# Local Swarm Architecture
|
||||
|
||||
## Core Concept
|
||||
|
||||
Deploy multiple LLM instances on your hardware. Each instance processes the same input independently, then they vote on the best answer. Connect multiple machines running this to create a "hive mind" utilizing all your old hardware.
|
||||
|
||||
## How It Works
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────────────────────────┐
|
||||
│ Your Prompt │────▶│ Swarm Manager │
|
||||
└─────────────────┘ │ ┌─────────┐ ┌─────────┐ ┌─────────┐│
|
||||
│ │Worker 1 │ │Worker 2 │ │Worker 3 ││
|
||||
│ │ (LLM) │ │ (LLM) │ │ (LLM) ││
|
||||
│ └────┬────┘ └────┬────┘ └────┬────┘│
|
||||
│ └───────────┼───────────┘ │
|
||||
│ ▼ │
|
||||
│ Consensus Engine │
|
||||
│ (Picks best answer) │
|
||||
└───────────────────┬─────────────────┘
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ Best Response │
|
||||
└───────────────┘
|
||||
```
|
||||
|
||||
## Components
|
||||
|
||||
### 1. Hardware Detection (`src/hardware/`)
|
||||
Detects your GPU and available memory to optimize model selection.
|
||||
|
||||
- **NVIDIA** - pynvml
|
||||
- **AMD** - rocm-smi
|
||||
- **Intel** - sycl-ls
|
||||
- **Apple Silicon** - sysctl/unified memory
|
||||
- **Qualcomm** - Android/Termux detection
|
||||
- **CPU** - psutil
|
||||
|
||||
### 2. Model Selection (`src/models/`)
|
||||
Automatically picks the best model based on available memory:
|
||||
|
||||
```
|
||||
Available Memory → Model Size → Quantization → Instance Count
|
||||
24 GB → 14B → Q4_K_M → 2-3 instances
|
||||
16 GB → 7B → Q4_K_M → 3-4 instances
|
||||
8 GB → 3B → Q6_K → 2-3 instances
|
||||
```
|
||||
|
||||
### 3. Backends (`src/backends/`)
|
||||
Run the actual LLM inference:
|
||||
|
||||
- **llama.cpp** - CUDA, ROCm, SYCL, CPU (cross-platform)
|
||||
- **MLX** - Apple Silicon optimized
|
||||
|
||||
### 4. Swarm Management (`src/swarm/`)
|
||||
Manages multiple LLM workers and consensus voting.
|
||||
|
||||
**Workers**: Each runs an independent LLM instance
|
||||
**Consensus**: Picks the best response using:
|
||||
- Similarity (semantic grouping)
|
||||
- Quality (code blocks, structure)
|
||||
- Fastest (latency)
|
||||
- Majority (exact match)
|
||||
|
||||
### 5. Network Federation (`src/network/`)
|
||||
Connect multiple machines into a distributed swarm:
|
||||
|
||||
```
|
||||
Machine 1 (4 workers) ──┐
|
||||
Machine 2 (2 workers) ──┼──▶ Cross-Swarm Consensus ──▶ Best Answer
|
||||
Machine 3 (3 workers) ──┘
|
||||
```
|
||||
|
||||
**Discovery**: mDNS/Bonjour auto-discovery
|
||||
**Protocol**: HTTP between peers
|
||||
**Voting**: Two-phase (local consensus → global consensus)
|
||||
|
||||
### 6. API (`src/api/`)
|
||||
OpenAI-compatible REST API:
|
||||
|
||||
- `POST /v1/chat/completions` - Main endpoint
|
||||
- `GET /v1/models` - List models
|
||||
- `GET /health` - Health check
|
||||
- Federation endpoints when enabled
|
||||
|
||||
### 7. Tools (`src/tools/`)
|
||||
Optional tool execution for enhanced capabilities:
|
||||
|
||||
- `read_file` - Read files
|
||||
- `write_file` - Write files
|
||||
- `execute_bash` - Run shell commands
|
||||
|
||||
## Data Flow
|
||||
|
||||
1. **Request** comes in via API
|
||||
2. **Swarm Manager** sends to all workers
|
||||
3. **Workers** generate responses in parallel
|
||||
4. **Consensus** picks the best answer
|
||||
5. **Response** returned to client
|
||||
|
||||
## Memory Model
|
||||
|
||||
- **External GPU**: Use 90% of VRAM
|
||||
- **Apple Silicon**: Use RAM - 4GB buffer
|
||||
- **CPU-only**: Use RAM - 4GB buffer
|
||||
|
||||
Each worker loads the full model independently (no sharing).
|
||||
|
||||
## Future Ideas
|
||||
|
||||
- Context compression for long inputs
|
||||
- CPU offloading for memory-constrained systems
|
||||
- RAG integration for knowledge bases
|
||||
- Speculative decoding for speed
|
||||
|
||||
-210
@@ -1,210 +0,0 @@
|
||||
# Context Window Handling in Local Swarm
|
||||
|
||||
## Overview
|
||||
|
||||
This document summarizes how context windows work in swarm architectures and the design decisions made for Local Swarm.
|
||||
|
||||
## The Core Challenge
|
||||
|
||||
When running multiple LLM workers (instances) for consensus voting, each worker needs to process the input. For long contexts (30K-60K+ tokens), this creates memory pressure:
|
||||
|
||||
- **7B model at 32K context:** ~8GB VRAM per worker
|
||||
- **7B model at 64K context:** ~14GB VRAM per worker
|
||||
- **Input duplication:** Each worker processes the full input independently
|
||||
|
||||
## Industry Approaches
|
||||
|
||||
### 1. Mixture of Experts (MoE)
|
||||
**Used by:** GPT-4, Mixtral 8x7B
|
||||
|
||||
- Full input goes to all "expert" sub-models
|
||||
- Router network decides which experts to activate
|
||||
- Each expert is smaller (e.g., 8x7B vs 1x56B equivalent)
|
||||
- **Trade-off:** More parameters total, but only a subset active per token
|
||||
|
||||
### 2. Ensemble Voting (Local Swarm's Approach)
|
||||
**Characteristics:**
|
||||
|
||||
- Full input to all workers
|
||||
- Each worker generates independently
|
||||
- Vote on final outputs
|
||||
- **Pros:** True parallel processing, diverse perspectives
|
||||
- **Cons:** 100% input duplication, memory intensive
|
||||
|
||||
### 3. Pipeline/Multi-Agent
|
||||
**Used by:** LangChain, AutoGPT
|
||||
|
||||
- Different workers get different subtasks
|
||||
- Sequential processing (not parallel)
|
||||
- **Pros:** Efficient memory usage, specialization
|
||||
- **Cons:** Loses swarm consensus benefit, higher latency
|
||||
|
||||
### 4. Speculative Decoding
|
||||
**Used by:** vLLM, Text Generation Inference
|
||||
|
||||
- Small "draft" model processes input
|
||||
- Large model verifies (doesn't reprocess)
|
||||
- **Pros:** 2-3x speedup
|
||||
- **Cons:** Complex implementation
|
||||
|
||||
## Memory Offloading
|
||||
|
||||
### What It Is
|
||||
Moving part of the model's state from GPU VRAM to system RAM:
|
||||
|
||||
- **Hot context** (active tokens) → GPU VRAM (fast)
|
||||
- **Cold context** (earlier tokens) → System RAM (slower)
|
||||
|
||||
### Performance Impact
|
||||
| Configuration | Speed | Memory |
|
||||
|---------------|-------|--------|
|
||||
| 100% GPU | 100% | 20GB VRAM |
|
||||
| 50% offload | 75% | 10GB VRAM + 10GB RAM |
|
||||
| 80% offload | 60% | 4GB VRAM + 16GB RAM |
|
||||
|
||||
### When to Use
|
||||
- **Recommended:** When you have plenty of RAM (32GB+) but limited VRAM (8-12GB)
|
||||
- **Trade-off:** 25-40% slower, but can run 2-3x more workers
|
||||
- **Implementation:** vLLM, DeepSpeed ZeRO-Infinity, llama.cpp
|
||||
|
||||
## Can Workers Share Context?
|
||||
|
||||
### The Short Answer
|
||||
**Raw input tokens:** Yes (negligible memory)
|
||||
**KV Cache (attention states):** No (99% of memory, unique per worker)
|
||||
|
||||
### Why KV Cache Can't Be Shared
|
||||
|
||||
The attention mechanism requires unique Key/Value tensors per token position:
|
||||
|
||||
```
|
||||
Token 1: [K1, V1] ← unique to this position
|
||||
Token 2: [K2, V2] ← depends on Token 1
|
||||
...
|
||||
Token N: [KN, VN] ← depends on all previous
|
||||
```
|
||||
|
||||
Even with the same input:
|
||||
- Different random seeds → different attention patterns
|
||||
- Each worker builds its own understanding
|
||||
- The "notes and highlights" (KV cache) are unique per worker
|
||||
|
||||
### Analogy
|
||||
Five people reading the same book:
|
||||
- ✅ **Can share:** The physical book (input tokens)
|
||||
- ❌ **Can't share:** Their notes, highlights, thoughts (KV cache)
|
||||
|
||||
## Options for Long Context (30K-60K+ tokens)
|
||||
|
||||
### Option 1: Long-Context Models
|
||||
**Models:** Phi-3.5 Mini, Llama 3.1/3.2, Qwen 2.5 (128K context)
|
||||
|
||||
**Pros:**
|
||||
- Simplest architecture
|
||||
- True parallel swarm voting
|
||||
- No preprocessing
|
||||
|
||||
**Cons:**
|
||||
- Requires 8-12GB VRAM per worker at 60K context
|
||||
- Limited model selection
|
||||
|
||||
**Best for:** Users with high-end GPUs (RTX 4090, 24GB+ VRAM)
|
||||
|
||||
### Option 2: Context Compression
|
||||
**Architecture:** Two-stage processing
|
||||
|
||||
**Stage 1:** Compression swarm (3-5 workers)
|
||||
- Split 60K into chunks
|
||||
- Summarize each chunk
|
||||
- Aggregate to 8K compressed context
|
||||
|
||||
**Stage 2:** Solution swarm (N workers)
|
||||
- Each worker gets 8K compressed + 2K relevant original
|
||||
- Generate independently
|
||||
- Vote on best
|
||||
|
||||
**Pros:**
|
||||
- Works with standard 8K models
|
||||
- Maintains swarm architecture
|
||||
- More workers possible
|
||||
|
||||
**Cons:**
|
||||
- Potential information loss
|
||||
- Added latency (~2-3s)
|
||||
|
||||
**Best for:** Users with 8-16GB VRAM who need 30K+ context
|
||||
|
||||
### Option 3: Hierarchical RAG
|
||||
**Architecture:** Three-tier system
|
||||
|
||||
**Tier 1:** Indexing swarm
|
||||
- Embed context into vector database
|
||||
- Create searchable knowledge graph
|
||||
|
||||
**Tier 2:** Retrieval + Generation
|
||||
- Query index for relevant context
|
||||
- Each worker gets ~6K retrieved + 2K raw
|
||||
- Generate solutions
|
||||
|
||||
**Tier 3:** Voting swarm
|
||||
- Rerank and consensus
|
||||
|
||||
**Pros:**
|
||||
- Scales to 100K+ tokens
|
||||
- Most robust to information loss
|
||||
- Specialized workers
|
||||
|
||||
**Cons:**
|
||||
- Complex implementation
|
||||
- 3x higher latency
|
||||
- Requires vector DB
|
||||
|
||||
**Best for:** Maximum accuracy, production deployments
|
||||
|
||||
## Current Local Swarm Implementation
|
||||
|
||||
Local Swarm currently uses **Ensemble Voting (Option 1)** with standard context windows:
|
||||
|
||||
- 2K-8K context (model dependent)
|
||||
- Each worker loads full model independently
|
||||
- No context sharing between workers
|
||||
- No offloading to system RAM (yet)
|
||||
|
||||
## Recommendations
|
||||
|
||||
### For 8K-16K Context
|
||||
Use current implementation with standard models
|
||||
|
||||
### For 30K+ Context
|
||||
Choose based on your hardware:
|
||||
|
||||
| Setup | Recommended Approach |
|
||||
|-------|---------------------|
|
||||
| RTX 4090 (24GB) | Option 1: Long-context models |
|
||||
| RTX 4060 Ti (16GB) | Option 2: Context compression |
|
||||
| Multiple machines (federated) | Option 2 or 3 |
|
||||
| CPU-only | Option 2 with aggressive compression |
|
||||
|
||||
### Memory-Constrained Setups
|
||||
Enable CPU offloading to run more workers:
|
||||
|
||||
```bash
|
||||
# llama.cpp example
|
||||
./main --cpu-partial 0.8 # Offload 80% to RAM
|
||||
```
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
Potential improvements for Local Swarm:
|
||||
|
||||
1. **Context compression layer** (Option 2 implementation)
|
||||
2. **CPU offloading support** for memory-constrained systems
|
||||
3. **Hierarchical RAG** for enterprise use cases
|
||||
4. **Speculative decoding** for 2-3x speedup
|
||||
|
||||
## References
|
||||
|
||||
- vLLM PagedAttention: Efficient KV cache management
|
||||
- DeepSpeed ZeRO-Infinity: Offloading to CPU/NVMe
|
||||
- Mixtral 8x7B: Mixture of Experts architecture
|
||||
- Phi-3.5 Technical Report: Long-context small models
|
||||
@@ -0,0 +1,215 @@
|
||||
# Development Patterns Analysis
|
||||
|
||||
## Circular Development Issues Identified
|
||||
|
||||
### 1. Tool Execution Architecture (15+ commits going in circles)
|
||||
|
||||
**The Cycle:**
|
||||
```
|
||||
Add server-side tool execution → Fix looping issues → Remove/simplify instructions
|
||||
→ Tools don't work → Add tool host → Return tool_calls to client (reversal)
|
||||
→ Execute server-side again (reversal back) → Fix parsing → Simplify format
|
||||
→ Enhance instructions → Add streaming support → Fix streaming format...
|
||||
```
|
||||
|
||||
**Commits showing the cycle:**
|
||||
- `00cd483` - Add server-side tool execution
|
||||
- `df4587e` - Fix: prevent looping (checking for server-side results)
|
||||
- `c70f83a` - Fix: simplify looping prevention
|
||||
- `1b181bf` - Fix: remove tool instructions (40k → 0 tokens)
|
||||
- `bad8732` - Fix: simplify to ~300 tokens
|
||||
- `12eaac0` - Add distributed tool host
|
||||
- `b7fc184` - **REVERSAL:** Return tool_calls to opencode (not server-side)
|
||||
- `f83e6fc` - **REVERSAL BACK:** Execute via tool executor
|
||||
- `aa137b6` - Fix: handle tool_calls as single object or array
|
||||
- `539ca21` - Simplify format to TOOL:/ARGUMENTS: pattern
|
||||
- `aabd2b2` - Enhance instructions for multi-step operations
|
||||
|
||||
**Root Cause:** No clear architectural decision on:
|
||||
- Who executes tools? (Server vs Client)
|
||||
- What format? (JSON vs text patterns vs markdown)
|
||||
- When to add instructions? (Always vs first request vs never)
|
||||
|
||||
### 2. Tool Instruction Token Count (4 changes)
|
||||
|
||||
```
|
||||
40,000 tokens → 300 tokens → removed → enhanced (unknown count)
|
||||
```
|
||||
|
||||
**Problem:** No testing to validate if instructions actually work.
|
||||
|
||||
### 3. Tool Parsing (8+ fixes)
|
||||
|
||||
Multiple commits fixing the same parsing issues:
|
||||
- `c5b8196` - Parse nested JSON in arguments
|
||||
- `76b12b3` - Parse JavaScript-style output
|
||||
- `9d838c1` - Handle markdown code blocks
|
||||
- `e3701cf` - Extract content before tool_calls block
|
||||
- `aa137b6` - Handle single object or array
|
||||
- `539ca21` - Simplify to TOOL:/ARGUMENTS: pattern
|
||||
|
||||
**Problem:** No unit tests for parsing. Each fix only handles one case.
|
||||
|
||||
### 4. Streaming + Tools (4 commits)
|
||||
|
||||
```
|
||||
Disable streaming when tools present → Add to streaming path → Fix SSE format
|
||||
```
|
||||
|
||||
**Problem:** Two completely different code paths that diverge and need separate fixes.
|
||||
|
||||
### 5. Debugging Commits (6 commits)
|
||||
|
||||
Commits that only add debug logging:
|
||||
- `e0c500e` - "very visible request/response logging"
|
||||
- `25b675c` - "explicit logging for tool executor configuration"
|
||||
- `27e1971` - "response logging to both paths"
|
||||
- `e3eb52d` - "log message state"
|
||||
- `13e6fb2` - "add logging to tool call parsing"
|
||||
- `3039629` - "log request.tools"
|
||||
|
||||
**Problem:** Debugging in production instead of having tests.
|
||||
|
||||
## Why This Happens
|
||||
|
||||
### 1. No Tests
|
||||
- **Impact:** Every change requires manual testing
|
||||
- **Result:** Fixes break other cases, regressions common
|
||||
- **Evidence:** 25+ commits fixing tool-related issues
|
||||
|
||||
### 2. Production Debugging
|
||||
- **Pattern:** Add debug logging → Fix → Remove debug logging
|
||||
- **Commits:** `e0c500e`, `3728eb7` (add then clean up)
|
||||
- **Better:** Unit tests with mocked LLM responses
|
||||
|
||||
### 3. Architectural Ambiguity
|
||||
- **Question:** Who owns tool execution?
|
||||
- **Server-side:** Better for simple providers
|
||||
- **Client-side:** Better for complex opencode integration
|
||||
- **Actual:** Switched back and forth 3+ times
|
||||
|
||||
### 4. Feature Interaction Complexity
|
||||
- Tools + Streaming = Two paths to maintain
|
||||
- Tools + Federation = Distributed execution complexity
|
||||
- Tools + Different formats = Parsing nightmare
|
||||
|
||||
### 5. Unclear Requirements
|
||||
- Should instructions be in system prompt or user prompt?
|
||||
- How many tokens is acceptable?
|
||||
- What format should tools return?
|
||||
|
||||
## Recommendations to Prevent This
|
||||
|
||||
### Immediate (Prevents Next Cycle)
|
||||
|
||||
1. **Pick One Architecture**
|
||||
- Decision: Server-side execution via tool executor
|
||||
- Document why in ARCHITECTURE.md
|
||||
|
||||
2. **Token Budget**
|
||||
- Max 2000 tokens for tool instructions
|
||||
- Test with actual 16K context models
|
||||
- Never exceed 50% of context window
|
||||
|
||||
3. **One Format Only**
|
||||
- Standardize on: `TOOL: name\nARGUMENTS: {"key": "value"}`
|
||||
- Remove all other parsing code
|
||||
- Single regex pattern
|
||||
|
||||
4. **Add Unit Tests**
|
||||
```python
|
||||
# test_tool_parsing.py
|
||||
def test_parse_simple_tool():
|
||||
text = "TOOL: read\nARGUMENTS: {\"filePath\": \"test.txt\"}"
|
||||
content, tools = parse_tool_calls(text)
|
||||
assert len(tools) == 1
|
||||
assert tools[0]["function"]["name"] == "read"
|
||||
|
||||
def test_parse_no_tool():
|
||||
text = "Just a regular response"
|
||||
content, tools = parse_tool_calls(text)
|
||||
assert len(tools) == 0
|
||||
assert content == text
|
||||
|
||||
def test_parse_multiple_tools():
|
||||
text = "TOOL: read\nARGUMENTS: {...}\n\nTOOL: write\nARGUMENTS: {...}"
|
||||
content, tools = parse_tool_calls(text)
|
||||
assert len(tools) == 2
|
||||
```
|
||||
|
||||
5. **Integration Test Script**
|
||||
```bash
|
||||
# test_tools.sh
|
||||
python main.py --auto --test-tools
|
||||
# Tests: read file → write file → bash command
|
||||
# Exits with error code if any fail
|
||||
```
|
||||
|
||||
6. **Simplify Tool Instructions**
|
||||
- Current: ~300 tokens with 5 examples
|
||||
- Target: ~100 tokens with 2 examples
|
||||
- Include: read, write only (bash is obvious)
|
||||
|
||||
### Medium-term
|
||||
|
||||
7. **Separate Concerns**
|
||||
```
|
||||
src/tools/
|
||||
├── parser.py # Only parsing logic
|
||||
├── executor.py # Only execution logic
|
||||
├── formatter.py # Only formatting instructions
|
||||
└── integration.py # Only API integration
|
||||
```
|
||||
|
||||
8. **Design Doc Before Code**
|
||||
- For tool system changes, write 1-page design first
|
||||
- Include: format, token count, examples, test plan
|
||||
- Get it right on paper before coding
|
||||
|
||||
9. **Feature Flags**
|
||||
```python
|
||||
# config.py
|
||||
USE_SERVER_SIDE_TOOLS = True # Can toggle without code changes
|
||||
TOOL_INSTRUCTION_VERSION = "v2" # A/B test formats
|
||||
```
|
||||
|
||||
### Long-term
|
||||
|
||||
10. **CI/CD Pipeline**
|
||||
- Run tests on every PR
|
||||
- Block merge if tests fail
|
||||
- Include: unit tests, integration tests, token count check
|
||||
|
||||
11. **Observability**
|
||||
- Structured logging (not print statements)
|
||||
- Metrics: tool success rate, parsing errors, latency
|
||||
- Dashboard to see issues before users report them
|
||||
|
||||
## Current State Assessment
|
||||
|
||||
**Good:**
|
||||
- Tool executor abstraction exists
|
||||
- Distributed tool execution works
|
||||
- Working directory handling improved
|
||||
- Timeout handling for package managers
|
||||
|
||||
**Needs Work:**
|
||||
- Too many parsing code paths (simplify to one)
|
||||
- Instructions too long (reduce to <2000 tokens)
|
||||
- No automated testing
|
||||
- Debug logging still in production code
|
||||
|
||||
## Suggested Immediate Actions
|
||||
|
||||
1. Merge current cleanup branch (already done ✓)
|
||||
2. Remove all but one parsing format (done ✓)
|
||||
3. Reduce tool instructions to <2000 tokens (done ✓)
|
||||
4. Add unit tests for tool parsing (done ✓)
|
||||
5. Add integration test for tool execution
|
||||
|
||||
## Success Metrics
|
||||
|
||||
- Tool-related commits stabilize to <2 per month
|
||||
- Zero "fix: prevent looping" commits
|
||||
- All tool changes include tests
|
||||
- Instructions stay under 2000 tokens
|
||||
-524
@@ -1,524 +0,0 @@
|
||||
# Local Swarm - Complete Documentation
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Quick Start Guide](#quick-start-guide)
|
||||
2. [Opencode Configuration](#opencode-configuration)
|
||||
3. [API Reference](#api-reference)
|
||||
4. [Troubleshooting](#troubleshooting)
|
||||
5. [Advanced Configuration](#advanced-configuration)
|
||||
6. [Performance Tuning](#performance-tuning)
|
||||
|
||||
---
|
||||
|
||||
## Quick Start Guide
|
||||
|
||||
### Installation
|
||||
|
||||
**Windows:**
|
||||
```powershell
|
||||
git clone https://github.com/yourusername/local_swarm.git
|
||||
cd local_swarm
|
||||
.\scripts\install.bat
|
||||
```
|
||||
|
||||
**macOS/Linux:**
|
||||
```bash
|
||||
git clone https://github.com/yourusername/local_swarm.git
|
||||
cd local_swarm
|
||||
chmod +x scripts/install.sh
|
||||
./scripts/install.sh
|
||||
```
|
||||
|
||||
**Android (Termux):**
|
||||
```bash
|
||||
git clone https://github.com/yourusername/local_swarm.git
|
||||
cd local_swarm
|
||||
chmod +x scripts/install-termux.sh
|
||||
./scripts/install-termux.sh
|
||||
```
|
||||
|
||||
### First Run
|
||||
|
||||
```bash
|
||||
# Start with interactive menu
|
||||
python main.py
|
||||
|
||||
# Or skip menu with auto-detection
|
||||
python main.py --auto
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Opencode Configuration
|
||||
|
||||
### Basic Configuration
|
||||
|
||||
Add to your opencode configuration file (usually `~/.config/opencode/config.json`):
|
||||
|
||||
```json
|
||||
{
|
||||
"model": {
|
||||
"provider": "openai",
|
||||
"base_url": "http://localhost:8000/v1",
|
||||
"api_key": "not-needed",
|
||||
"model": "local-swarm"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Configuration with Local Swarm on Different Machine
|
||||
|
||||
If Local Swarm is running on another computer in your network:
|
||||
|
||||
```json
|
||||
{
|
||||
"model": {
|
||||
"provider": "openai",
|
||||
"base_url": "http://192.168.1.100:8000/v1",
|
||||
"api_key": "not-needed",
|
||||
"model": "local-swarm"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Multiple Model Options
|
||||
|
||||
You can configure multiple models and switch between them:
|
||||
|
||||
```json
|
||||
{
|
||||
"models": {
|
||||
"local-swarm": {
|
||||
"provider": "openai",
|
||||
"base_url": "http://localhost:8000/v1",
|
||||
"api_key": "not-needed",
|
||||
"model": "local-swarm"
|
||||
},
|
||||
"local-swarm-fast": {
|
||||
"provider": "openai",
|
||||
"base_url": "http://localhost:8000/v1",
|
||||
"api_key": "not-needed",
|
||||
"model": "local-swarm",
|
||||
"temperature": 0.2
|
||||
}
|
||||
},
|
||||
"default_model": "local-swarm"
|
||||
}
|
||||
```
|
||||
|
||||
### With Context Window Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"model": {
|
||||
"provider": "openai",
|
||||
"base_url": "http://localhost:8000/v1",
|
||||
"api_key": "not-needed",
|
||||
"model": "local-swarm",
|
||||
"max_tokens": 4096,
|
||||
"temperature": 0.7
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Environment-Specific Configurations
|
||||
|
||||
**Development (local only):**
|
||||
```json
|
||||
{
|
||||
"model": {
|
||||
"provider": "openai",
|
||||
"base_url": "http://localhost:8000/v1",
|
||||
"api_key": "not-needed",
|
||||
"model": "local-swarm",
|
||||
"temperature": 0.8
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Production (federated swarm):**
|
||||
```json
|
||||
{
|
||||
"model": {
|
||||
"provider": "openai",
|
||||
"base_url": "http://swarm-coordinator.local:8000/v1",
|
||||
"api_key": "not-needed",
|
||||
"model": "local-swarm",
|
||||
"temperature": 0.5
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Testing the Configuration
|
||||
|
||||
After configuring opencode, test with:
|
||||
|
||||
```bash
|
||||
# Simple test
|
||||
opencode --version
|
||||
|
||||
# Test with a prompt
|
||||
echo "Write a Python function to calculate factorial" | opencode
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Reference
|
||||
|
||||
### OpenAI-Compatible Endpoints
|
||||
|
||||
Local Swarm implements the OpenAI API specification.
|
||||
|
||||
#### POST /v1/chat/completions
|
||||
|
||||
Generate a chat completion.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"model": "local-swarm",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Write a Python function to calculate factorial"}
|
||||
],
|
||||
"max_tokens": 2048,
|
||||
"temperature": 0.7,
|
||||
"stream": false
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"id": "chatcmpl-abc123",
|
||||
"object": "chat.completion",
|
||||
"created": 1234567890,
|
||||
"model": "local-swarm",
|
||||
"choices": [{
|
||||
"index": 0,
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": "def factorial(n):\n if n <= 1:\n return 1\n return n * factorial(n-1)"
|
||||
},
|
||||
"finish_reason": "stop"
|
||||
}],
|
||||
"usage": {
|
||||
"prompt_tokens": 15,
|
||||
"completion_tokens": 25,
|
||||
"total_tokens": 40
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /v1/models
|
||||
|
||||
List available models.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"object": "list",
|
||||
"data": [
|
||||
{
|
||||
"id": "local-swarm",
|
||||
"object": "model",
|
||||
"created": 1234567890,
|
||||
"owned_by": "local-swarm"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /health
|
||||
|
||||
Check health status.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"version": "0.1.0",
|
||||
"workers": 5,
|
||||
"model": "Qwen 2.5 Coder 7b (q4_k_m)"
|
||||
}
|
||||
```
|
||||
|
||||
#### Federation Endpoints (when enabled)
|
||||
|
||||
**GET /v1/federation/status**
|
||||
```json
|
||||
{
|
||||
"enabled": true,
|
||||
"total_peers": 3,
|
||||
"healthy_peers": 3,
|
||||
"strategy": "weighted"
|
||||
}
|
||||
```
|
||||
|
||||
**GET /v1/federation/peers**
|
||||
```json
|
||||
{
|
||||
"peers": [
|
||||
{
|
||||
"name": "desktop-pc",
|
||||
"host": "192.168.1.100",
|
||||
"port": 8000,
|
||||
"model_id": "qwen2.5-coder:7b:q4_k_m",
|
||||
"instances": 3
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### Issue: "No module named 'llama_cpp'"
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Install with pre-built wheel (recommended)
|
||||
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
|
||||
|
||||
# Or CPU-only
|
||||
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
|
||||
```
|
||||
|
||||
#### Issue: "CUDA not detected" on Windows
|
||||
|
||||
**Solution:**
|
||||
1. Install NVIDIA drivers: https://www.nvidia.com/drivers
|
||||
2. Verify with: `nvidia-smi`
|
||||
3. Reinstall with CUDA support:
|
||||
```powershell
|
||||
pip uninstall llama-cpp-python
|
||||
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
|
||||
```
|
||||
|
||||
#### Issue: "Out of memory" errors
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Reduce instances
|
||||
python main.py --instances 2
|
||||
|
||||
# Or use smaller model
|
||||
python main.py --model qwen2.5-coder:3b:q4
|
||||
```
|
||||
|
||||
#### Issue: Slow performance on CPU
|
||||
|
||||
**Solution:**
|
||||
- Use smaller models (3B instead of 7B)
|
||||
- Use Q4 quantization instead of Q6
|
||||
- Reduce number of instances to 2-3
|
||||
- Close other applications
|
||||
|
||||
#### Issue: "No suitable model found"
|
||||
|
||||
**Solution:**
|
||||
Your system has less than 2GB available memory. Try:
|
||||
- Close other applications
|
||||
- Use CPU-only mode (automatic if no GPU)
|
||||
- Add more RAM or use a machine with GPU
|
||||
|
||||
#### Issue: Models not downloading
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Check internet connection
|
||||
ping huggingface.co
|
||||
|
||||
# Try manual download
|
||||
python main.py --download-only
|
||||
|
||||
# Check cache directory
|
||||
ls ~/.local_swarm/models
|
||||
```
|
||||
|
||||
### Platform-Specific Issues
|
||||
|
||||
**Windows:**
|
||||
- Ensure Python is in PATH
|
||||
- Run PowerShell as Administrator if needed
|
||||
- Install Visual C++ Redistributable
|
||||
|
||||
**macOS:**
|
||||
- Xcode Command Line Tools: `xcode-select --install`
|
||||
- May need to allow llama.cpp in Security preferences
|
||||
|
||||
**Linux:**
|
||||
- Install build essentials: `sudo apt-get install build-essential`
|
||||
- For AMD: Install ROCm drivers
|
||||
- For Intel: Install oneAPI toolkit
|
||||
|
||||
---
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### Configuration File (config.yaml)
|
||||
|
||||
Create `config.yaml` in the project root:
|
||||
|
||||
```yaml
|
||||
server:
|
||||
host: "127.0.0.1"
|
||||
port: 8000
|
||||
|
||||
swarm:
|
||||
consensus_strategy: "similarity" # similarity, quality, fastest
|
||||
min_instances: 2
|
||||
max_instances: 5
|
||||
|
||||
federation:
|
||||
enabled: false
|
||||
discovery_port: 8765
|
||||
federation_port: 8766
|
||||
max_peers: 10
|
||||
|
||||
hardware:
|
||||
gpu_memory_fraction: 1.0 # Use 100% of GPU VRAM
|
||||
ram_fraction: 0.5 # Use 50% of system RAM for CPU
|
||||
|
||||
models:
|
||||
cache_dir: "~/.local_swarm/models"
|
||||
preferred_models:
|
||||
- qwen2.5-coder
|
||||
- deepseek-coder
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Custom cache directory
|
||||
export LOCAL_SWARM_CACHE_DIR="/path/to/models"
|
||||
|
||||
# Debug mode
|
||||
export LOCAL_SWARM_DEBUG=1
|
||||
|
||||
# Custom config file
|
||||
export LOCAL_SWARM_CONFIG="/path/to/config.yaml"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### For Maximum Speed
|
||||
|
||||
```bash
|
||||
# Use smaller model
|
||||
python main.py --model qwen2.5-coder:3b:q4
|
||||
|
||||
# Reduce instances (less memory contention)
|
||||
python main.py --instances 2
|
||||
|
||||
# Skip consensus (single worker)
|
||||
# Edit config: consensus_strategy: "fastest"
|
||||
```
|
||||
|
||||
### For Maximum Quality
|
||||
|
||||
```bash
|
||||
# Use largest model that fits
|
||||
python main.py --model qwen2.5-coder:7b:q6
|
||||
|
||||
# More instances for better consensus
|
||||
python main.py --instances 5
|
||||
|
||||
# Use quality consensus strategy
|
||||
# Edit config: consensus_strategy: "quality"
|
||||
```
|
||||
|
||||
### For Balanced Performance
|
||||
|
||||
```bash
|
||||
# Recommended defaults (automatic)
|
||||
python main.py
|
||||
|
||||
# Or explicitly
|
||||
python main.py --model qwen2.5-coder:7b:q4
|
||||
```
|
||||
|
||||
### Memory Usage by Model
|
||||
|
||||
| Model Size | Q4 VRAM | Q5 VRAM | Q6 VRAM |
|
||||
|------------|---------|---------|---------|
|
||||
| 1B-3B | 0.7-2GB | 0.9-2.5GB | 1.1-3GB |
|
||||
| 7B | 4.5GB | 5.2GB | 6.0GB |
|
||||
| 13B-15B | 8-9GB | 9.5-11GB | 11-13GB |
|
||||
|
||||
**Recommended:** Use Q4_K_M for best speed/quality balance.
|
||||
|
||||
---
|
||||
|
||||
## MCP Server Configuration
|
||||
|
||||
### Enable MCP Server
|
||||
|
||||
```bash
|
||||
python main.py --mcp
|
||||
```
|
||||
|
||||
### MCP Tools Available
|
||||
|
||||
When MCP is enabled, AI assistants can use:
|
||||
|
||||
- `get_hardware_info` - Query system capabilities
|
||||
- `get_swarm_status` - Check swarm health
|
||||
- `generate_code` - Generate with consensus
|
||||
- `list_available_models` - Browse models
|
||||
- `get_worker_details` - Worker statistics
|
||||
|
||||
### Testing MCP
|
||||
|
||||
```bash
|
||||
# List available tools
|
||||
mcp-cli call local-swarm list_tools
|
||||
|
||||
# Call a tool
|
||||
mcp-cli call local-swarm call_tool get_swarm_status
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Network Federation
|
||||
|
||||
### Setup Federated Swarm
|
||||
|
||||
On each machine in your network:
|
||||
|
||||
```bash
|
||||
# Machine 1 (Windows PC with RTX 4060)
|
||||
python main.py --federation --port 8000
|
||||
|
||||
# Machine 2 (Mac Mini M1)
|
||||
python main.py --federation --port 8000
|
||||
|
||||
# Machine 3 (Linux with AMD GPU)
|
||||
python main.py --federation --port 8000
|
||||
```
|
||||
|
||||
Machines will auto-discover each other via mDNS.
|
||||
|
||||
### Verify Federation
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/v1/federation/status
|
||||
curl http://localhost:8000/v1/federation/peers
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Getting Help
|
||||
|
||||
- **GitHub Issues:** https://github.com/sleepyeldrazi/local_swarm/issues
|
||||
- **Interactive Help:** Run `python main.py` and select `[t] Tips & Help`
|
||||
- **Hardware Detection:** Run `python main.py --detect`
|
||||
|
||||
## License
|
||||
|
||||
MIT License - See LICENSE file
|
||||
@@ -0,0 +1,92 @@
|
||||
# Design Decision: Complete React Example with Actual Code
|
||||
|
||||
**Date:** 2024-02-24
|
||||
**Scope:** src/api/routes.py tool_instructions
|
||||
|
||||
## Problem
|
||||
|
||||
Model is still not following instructions:
|
||||
1. Tries `npm install` before creating package.json
|
||||
2. Still tries `npx create-react-app` despite being told not to
|
||||
3. Instructions have placeholders like "..." and "etc." which models don't understand
|
||||
|
||||
## Root Cause
|
||||
|
||||
The current instructions say:
|
||||
```
|
||||
TOOL: write
|
||||
ARGUMENTS: {"filePath": "myapp/package.json", "content": "{\"name\": \"myapp\", \"version\": \"1.0.0\", \"dependencies\": {\"react\": \"^18.0.0\", \"react-dom\": \"^18.0.0\"}}"}
|
||||
|
||||
[Continue with src/index.js, src/App.js, public/index.html, etc.]
|
||||
```
|
||||
|
||||
**Problem:** "etc." and "..." are meaningless to LLMs. They need concrete examples.
|
||||
|
||||
## Solution
|
||||
|
||||
Provide a **complete, working, minimal React example** with actual file contents:
|
||||
|
||||
1. Exact sequence: mkdir → write package.json → write src/App.js → write src/index.js → write public/index.html → npm install
|
||||
2. Actual file content, not placeholders
|
||||
3. Minimal viable React app (not full create-react-app structure)
|
||||
|
||||
## Implementation
|
||||
|
||||
Replace vague example with complete working code:
|
||||
|
||||
```
|
||||
**COMPLETE REACT HELLO WORLD EXAMPLE:**
|
||||
|
||||
User: "Create a React Hello World app"
|
||||
|
||||
Step 1 - Create directory:
|
||||
TOOL: bash
|
||||
ARGUMENTS: {"command": "mkdir myapp"}
|
||||
|
||||
Step 2 - Create package.json (MUST do this BEFORE npm install):
|
||||
TOOL: write
|
||||
ARGUMENTS: {"filePath": "myapp/package.json", "content": "{\"name\": \"myapp\", \"version\": \"1.0.0\", \"private\": true, \"dependencies\": {\"react\": \"^18.2.0\", \"react-dom\": \"^18.2.0\"}, \"scripts\": {\"start\": \"react-scripts start\", \"build\": \"react-scripts build\"}, \"devDependencies\": {\"react-scripts\": \"5.0.1\"}}"}
|
||||
|
||||
Step 3 - Create src directory:
|
||||
TOOL: bash
|
||||
ARGUMENTS: {"command": "mkdir myapp/src"}
|
||||
|
||||
Step 4 - Create App.js:
|
||||
TOOL: write
|
||||
ARGUMENTS: {"filePath": "myapp/src/App.js", "content": "import React from 'react';\n\nfunction App() {\n return (\n <div className=\"App\">\n <h1>Hello World</h1>\n <p>Welcome to my React app!</p>\n </div>\n );\n}\n\nexport default App;"}
|
||||
|
||||
Step 5 - Create index.js:
|
||||
TOOL: write
|
||||
ARGUMENTS: {"filePath": "myapp/src/index.js", "content": "import React from 'react';\nimport ReactDOM from 'react-dom/client';\nimport App from './App';\n\nconst root = ReactDOM.createRoot(document.getElementById('root'));\nroot.render(<App />);"}
|
||||
|
||||
Step 6 - Create public directory and index.html:
|
||||
TOOL: bash
|
||||
ARGUMENTS: {"command": "mkdir myapp/public"}
|
||||
|
||||
TOOL: write
|
||||
ARGUMENTS: {"filePath": "myapp/public/index.html", "content": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n <meta charset=\"UTF-8\">\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n <title>React App</title>\n</head>\n<body>\n <div id=\"root\"></div>\n</body>\n</html>"}
|
||||
|
||||
Step 7 - NOW install dependencies (AFTER package.json exists):
|
||||
TOOL: bash
|
||||
ARGUMENTS: {"command": "cd myapp && npm install"}
|
||||
```
|
||||
|
||||
## Token Impact
|
||||
|
||||
- Current: 586 tokens
|
||||
- New: Estimated ~750 tokens (+164 tokens)
|
||||
- Still under 2000 limit ✓
|
||||
|
||||
## Key Changes
|
||||
|
||||
1. **Explicit sequencing:** "Step 1", "Step 2", etc.
|
||||
2. **Actual code:** No "..." or "etc." - real working content
|
||||
3. **Critical note:** "MUST do this BEFORE npm install"
|
||||
4. **Minimal structure:** Just what's needed for Hello World
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- [ ] Model creates package.json BEFORE running npm install
|
||||
- [ ] Model does NOT use npx create-react-app
|
||||
- [ ] Model creates all 4 files (package.json, App.js, index.js, index.html)
|
||||
- [ ] Model runs npm install last (after files exist)
|
||||
@@ -0,0 +1,84 @@
|
||||
# Design Decision: Fix Subprocess Hang on Interactive Commands
|
||||
|
||||
**Date:** 2024-02-24
|
||||
**Scope:** src/tools/executor.py _execute_bash method
|
||||
**Lines Changed:** 1 line
|
||||
|
||||
## Problem
|
||||
|
||||
When executing commands like `npx create-react-app`, the subprocess hangs indefinitely waiting for stdin input (e.g., "Ok to proceed? (y)"). This causes:
|
||||
1. 300s timeout to be reached
|
||||
2. opencode to hang waiting for response
|
||||
3. Poor user experience
|
||||
|
||||
## Root Cause
|
||||
|
||||
`subprocess.run()` by default inherits stdin from parent process. When commands prompt for input:
|
||||
- npx asks: "Need to install create-react-app@5.1.0 Ok to proceed? (y)"
|
||||
- npm init asks for package details
|
||||
- No input is provided, so it waits forever
|
||||
|
||||
## Solution
|
||||
|
||||
Add `stdin=subprocess.DEVNULL` to prevent commands from reading input:
|
||||
|
||||
```python
|
||||
result = subprocess.run(
|
||||
command,
|
||||
shell=True,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout,
|
||||
cwd=cwd,
|
||||
stdin=subprocess.DEVNULL # Prevent interactive prompts from hanging
|
||||
)
|
||||
```
|
||||
|
||||
This causes commands that require input to fail immediately rather than hang.
|
||||
|
||||
## Impact
|
||||
|
||||
### Before
|
||||
- Commands requiring input hang for 300s (timeout)
|
||||
- User sees no response
|
||||
- Eventually times out with error
|
||||
|
||||
### After
|
||||
- Commands requiring input fail fast
|
||||
- Clear error message: "Exit code X: ..."
|
||||
- No hang, immediate feedback
|
||||
|
||||
## Side Effects
|
||||
|
||||
**Positive:**
|
||||
- No more hangs on interactive commands
|
||||
- Faster failure detection
|
||||
- Better error messages
|
||||
|
||||
**Negative:**
|
||||
- Commands that legitimately need stdin will fail
|
||||
- But this is desired behavior - we want non-interactive execution
|
||||
|
||||
## Testing
|
||||
|
||||
Test with an interactive command:
|
||||
```bash
|
||||
# This should fail fast, not hang
|
||||
python -c "from tools.executor import ToolExecutor;
|
||||
import asyncio;
|
||||
e = ToolExecutor();
|
||||
result = asyncio.run(e.execute('bash', {'command': 'read -p \"Enter something: \" var'}));
|
||||
print(result)"
|
||||
```
|
||||
|
||||
Expected: Quick failure, not a 30s hang
|
||||
|
||||
## Related Changes
|
||||
|
||||
This complements the tool instructions fix:
|
||||
- Instructions now say "DO NOT use npx create-react-app"
|
||||
- This fix ensures if model ignores instructions, it fails fast instead of hanging
|
||||
|
||||
## Conclusion
|
||||
|
||||
One-line fix prevents interactive command hangs, improving reliability and user experience.
|
||||
@@ -0,0 +1,178 @@
|
||||
# Design Decision: Fix Tool Execution and Token Reporting
|
||||
|
||||
**Date:** 2024-02-24
|
||||
**Scope:** src/api/routes.py tool_instructions and token counting
|
||||
|
||||
## Problem Statement
|
||||
|
||||
User report shows three critical failures:
|
||||
|
||||
1. **Instruction vs Execution:** Model says "You should run mkdir..." instead of TOOL: format
|
||||
2. **Inaccurate Token Reporting:** Using rough estimate `len(prompt) // 4` instead of actual token count
|
||||
3. **Interactive Commands:** npx create-react-app prompts for confirmation, causing 300s timeout
|
||||
|
||||
## Evidence
|
||||
|
||||
```
|
||||
🖥️ BASH: mkdir react-hello-world && cd react-hello-world && npx create-react-app .
|
||||
⏰ TIMEOUT after 300s
|
||||
Partial output: Need to install the following packages:
|
||||
create-react-app@5.1.0
|
||||
Ok to proceed? (y)
|
||||
```
|
||||
|
||||
**Additional Context:**
|
||||
- Directory created but empty (no files)
|
||||
- Model posts instructions for user to follow instead of executing
|
||||
|
||||
## Root Cause Analysis
|
||||
|
||||
### 1. Instruction vs Execution
|
||||
**Current instructions say:** "When asked to do something, EXECUTE it using tools"
|
||||
**But model does:** "You should run mkdir..."
|
||||
**Why:** Instructions aren't strong enough - need explicit anti-patterns
|
||||
|
||||
### 2. Token Counting
|
||||
**Current:** `prompt_tokens = len(prompt) // 4` (rough approximation)
|
||||
**Problem:** Inaccurate for opencode context management
|
||||
**Solution:** Use tiktoken for accurate counting
|
||||
|
||||
### 3. Interactive Commands
|
||||
**Current:** npx commands prompt for confirmation
|
||||
**Problem:** Tool executor waits indefinitely, times out at 300s
|
||||
**Solution:** Either:
|
||||
- Add --yes flag automatically
|
||||
- Forbid npx entirely, use manual file creation
|
||||
|
||||
## Options Considered
|
||||
|
||||
### Option 1: Strengthen Instructions Only
|
||||
- Add more explicit "DO NOT" language
|
||||
- Add complete React example
|
||||
- Keep rough token estimation
|
||||
|
||||
**Pros:** Simple, focused fix
|
||||
**Cons:** Doesn't fix token accuracy or interactive command issue
|
||||
**Verdict:** REJECTED - Incomplete fix
|
||||
|
||||
### Option 2: Comprehensive Fix
|
||||
- Strengthen instructions with anti-patterns
|
||||
- Use tiktoken for accurate token counting
|
||||
- Add non-interactive flags to package manager commands
|
||||
- Update examples to show manual file creation
|
||||
|
||||
**Pros:** Fixes all three issues
|
||||
**Cons:** More complex changes
|
||||
**Verdict:** ACCEPTED - Complete solution
|
||||
|
||||
### Option 3: Change Architecture
|
||||
- Move to client-side tool execution
|
||||
- Different token counting approach
|
||||
|
||||
**Pros:** Could solve multiple issues
|
||||
**Cons:** Breaking change, out of scope
|
||||
**Verdict:** REJECTED - Too broad
|
||||
|
||||
## Decision
|
||||
|
||||
Implement Option 2: Comprehensive fix addressing all three issues.
|
||||
|
||||
### Changes
|
||||
|
||||
#### 1. Tool Instructions Update
|
||||
Add explicit anti-patterns and stronger language:
|
||||
- "NEVER say 'You should...' - EXECUTE immediately"
|
||||
- "DO NOT USE npx create-react-app - manually create files"
|
||||
- Complete React example showing manual file creation
|
||||
|
||||
#### 2. Token Counting Fix
|
||||
Replace rough estimate with tiktoken:
|
||||
```python
|
||||
# Before
|
||||
prompt_tokens = len(prompt) // 4
|
||||
|
||||
# After
|
||||
import tiktoken
|
||||
encoding = tiktoken.get_encoding('cl100k_base')
|
||||
prompt_tokens = len(encoding.encode(prompt))
|
||||
completion_tokens = len(encoding.encode(content))
|
||||
```
|
||||
|
||||
#### 3. Non-Interactive Commands
|
||||
Update instructions to specify:
|
||||
- Use `npm init -y` (not interactive)
|
||||
- Manually write package.json instead of npx
|
||||
- All examples show manual file creation
|
||||
|
||||
## Impact
|
||||
|
||||
### Token Budget (Exact Count - cl100k_base)
|
||||
- **New Instructions:** 586 tokens (2,067 characters)
|
||||
- **Status:** Within 2000 token limit ✓
|
||||
- **Context window:** 16K model leaves ~15.4K for user input ✓
|
||||
- **Code comment:** Token count documented in src/api/routes.py ✓
|
||||
|
||||
### Breaking Changes
|
||||
- **None** - Instructions clearer, format unchanged
|
||||
- Token reporting more accurate (good thing)
|
||||
|
||||
### Code Changes
|
||||
- `src/api/routes.py`:
|
||||
- Update tool_instructions (~+15 lines)
|
||||
- Add tiktoken import
|
||||
- Replace token estimation logic (~5 lines)
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
1. **Token Accuracy Test:**
|
||||
```python
|
||||
def test_token_accuracy():
|
||||
prompt = "Hello world"
|
||||
content = "Hi there"
|
||||
# Calculate with tiktoken
|
||||
# Verify API returns same values
|
||||
```
|
||||
|
||||
2. **Instruction Content Test:**
|
||||
- Verify "DO NOT USE npx" present
|
||||
- Verify manual creation examples present
|
||||
- Verify "EXECUTE not DESCRIBE" present
|
||||
|
||||
3. **Integration Test:**
|
||||
- Request: "Create React app"
|
||||
- Expect: Manual file creation via write tool
|
||||
- Not expect: npx create-react-app
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
If issues arise:
|
||||
1. Revert to previous instructions
|
||||
2. Keep tiktoken for token counting (beneficial)
|
||||
3. Document why manual creation didn't work
|
||||
|
||||
## Success Metrics
|
||||
|
||||
- [ ] Model uses TOOL: format 100% of time (not descriptions)
|
||||
- [ ] Token counts accurate within ±2%
|
||||
- [ ] React projects created via write tool (not npx)
|
||||
- [ ] No timeouts on package manager commands
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Token Counting
|
||||
Need to ensure tiktoken is in requirements.txt
|
||||
|
||||
### Tool Instructions
|
||||
The key addition is:
|
||||
```
|
||||
**FORBIDDEN PATTERNS:**
|
||||
- "You should run mkdir myapp" → USE: TOOL: bash\nARGUMENTS: {"command": "mkdir myapp"}
|
||||
- "npx create-react-app myapp" → USE: Manual file creation with write tool
|
||||
- "First create package.json, then..." → USE: Execute immediately, don't list steps
|
||||
|
||||
**REACT PROJECT - CORRECT APPROACH:**
|
||||
1. TOOL: bash, ARGUMENTS: {"command": "mkdir myapp"}
|
||||
2. TOOL: write, ARGUMENTS: {"filePath": "myapp/package.json", "content": "{\"name\": \"myapp\"...}"}
|
||||
3. TOOL: write, ARGUMENTS: {"filePath": "myapp/src/index.js", "content": "..."}
|
||||
4. Continue until all files created
|
||||
```
|
||||
@@ -0,0 +1,172 @@
|
||||
# Design Decision: Improved Tool Instructions
|
||||
|
||||
**Date:** 2024-02-24
|
||||
**Scope:** src/api/routes.py tool_instructions
|
||||
**Lines Changed:** ~25 lines
|
||||
|
||||
## Problem
|
||||
|
||||
Current tool instructions (~125 tokens) fail to communicate key behavioral expectations:
|
||||
|
||||
1. **Passive vs Active:** Model describes what to do instead of doing it
|
||||
2. **Refusal:** Model claims "I am only an AI assistant" instead of executing
|
||||
3. **Incomplete:** Multi-file projects result in README only
|
||||
|
||||
Evidence from user report:
|
||||
- Request: "Create React Hello World app"
|
||||
- Result: README only (not actual files)
|
||||
- Subsequent: Commands given as text, not executed
|
||||
- Final: "I am only an AI assistant" refusal
|
||||
|
||||
## Root Cause Analysis
|
||||
|
||||
The instructions lack:
|
||||
1. **Authority statement** - "You CAN and SHOULD use tools"
|
||||
2. **Execution mandate** - "Execute commands, don't just describe them"
|
||||
3. **Workflow clarity** - Clear step-by-step expectations
|
||||
4. **Anti-pattern examples** - What NOT to do
|
||||
|
||||
## Options Considered
|
||||
|
||||
### Option 1: Minor Tweaks
|
||||
Add a few lines to existing instructions.
|
||||
- **Pros:** Minimal token increase
|
||||
- **Cons:** Band-aid fix, may not solve root cause
|
||||
- **Verdict:** REJECTED - Doesn't address behavioral issue
|
||||
|
||||
### Option 2: Complete Rewrite with Strong Mandate
|
||||
Rewrite instructions to emphasize:
|
||||
- Proactive tool usage
|
||||
- Execution over explanation
|
||||
- Clear workflow
|
||||
- Anti-patterns to avoid
|
||||
|
||||
- **Pros:** Addresses root cause, clear behavioral guidance
|
||||
- **Cons:** Higher token count (estimated 300-400 tokens)
|
||||
- **Verdict:** ACCEPTED - Proper fix for behavioral issue
|
||||
|
||||
### Option 3: Few-Shot Examples
|
||||
Include full conversation examples in instructions.
|
||||
- **Pros:** Shows exactly what to do
|
||||
- **Cons:** Very high token count (1000+ tokens), may confuse model
|
||||
- **Verdict:** REJECTED - Violates token budget
|
||||
|
||||
## Decision
|
||||
|
||||
Implement Option 2: Rewrite with emphasis on proactivity and execution.
|
||||
|
||||
**Key additions:**
|
||||
1. **Capability statement:** "You have tools. Use them."
|
||||
2. **Execution mandate:** "Don't describe, execute"
|
||||
3. **Workflow:** Clear request→tool→result→next cycle
|
||||
4. **Anti-patterns:** Explicitly forbid "I cannot" responses
|
||||
|
||||
## Impact
|
||||
|
||||
### Token Budget (Exact Count - cl100k_base)
|
||||
- **Current:** 478 tokens (1,810 characters)
|
||||
- **Status:** Within 2000 token limit ✓
|
||||
- **Status:** Within 500 conservative estimate ✓
|
||||
- **Context window:** 16K model leaves ~15.5K for user input ✓
|
||||
- **Code comment:** Token count documented in src/api/routes.py ✓
|
||||
|
||||
### Code Changes
|
||||
- **File:** src/api/routes.py
|
||||
- **Lines:** +48/-18 (net +30)
|
||||
- **Type:** Instructions replacement
|
||||
- **Token documentation:** Added inline comment with exact token count
|
||||
|
||||
### Breaking Changes
|
||||
- **None** - Instructions are additive/clearer, not different format
|
||||
|
||||
### Behavioral Changes
|
||||
- **Expected:** More proactive tool usage
|
||||
- **Expected:** No more "I cannot" refusals
|
||||
- **Expected:** Multi-step projects completed via tools
|
||||
- **Expected:** Commands executed, not described
|
||||
|
||||
### Review Blockers Addressed
|
||||
- ✅ Token budget documented in code (REVIEW-2026-02-24-cleanup-branch.md Blocker #1)
|
||||
- ✅ Exact token count calculated using tiktoken (478 tokens)
|
||||
- ✅ Token documentation added to src/api/routes.py (REVIEW-2026-02-24-cleanup-branch.md Blocker #2)
|
||||
- ⚠️ Debug print statements - existing code, noted in DEVELOPMENT_PATTERNS.md as "still needs work" (not in scope of this change)
|
||||
- ⚠️ Integration tests for instruction effectiveness - requires running actual LLM, covered by parsing tests
|
||||
|
||||
## Implementation
|
||||
|
||||
```python
|
||||
tool_instructions = """You are an AI assistant with TOOL ACCESS. You CAN and MUST use tools to accomplish tasks.
|
||||
|
||||
**CRITICAL RULES:**
|
||||
1. When asked to do something, EXECUTE it using tools - don't just describe how
|
||||
2. You MUST use the bash tool to run commands like mkdir, npm, ls, etc.
|
||||
3. You MUST use the write tool to create files
|
||||
4. NEVER say "I cannot" or "I am only an AI assistant" - you HAVE tools, USE them
|
||||
5. Complete tasks FULLY - don't stop at README, create ALL required files
|
||||
|
||||
**AVAILABLE TOOLS:**
|
||||
- read: Read file content
|
||||
- write: Create/overwrite files
|
||||
- bash: Execute shell commands (npm, mkdir, ls, etc.)
|
||||
|
||||
**TOOL FORMAT (STRICT):**
|
||||
TOOL: tool_name
|
||||
ARGUMENTS: {"param": "value"}
|
||||
|
||||
**WORKFLOW:**
|
||||
1. User asks for something
|
||||
2. You decide what tool to use
|
||||
3. You respond with ONLY the TOOL: format above
|
||||
4. You receive the tool result
|
||||
5. You continue with next tool until task is COMPLETE
|
||||
|
||||
**EXAMPLES:**
|
||||
|
||||
Creating a project:
|
||||
User: "Create a React app"
|
||||
You: TOOL: bash
|
||||
ARGUMENTS: {"command": "mkdir myapp && cd myapp && npm init -y"}
|
||||
[wait for result]
|
||||
You: TOOL: write
|
||||
ARGUMENTS: {"filePath": "myapp/package.json", "content": "..."}
|
||||
[continue until all files created]
|
||||
|
||||
Running commands:
|
||||
User: "Install dependencies"
|
||||
You: TOOL: bash
|
||||
ARGUMENTS: {"command": "npm install"}
|
||||
[wait for result, then confirm completion]
|
||||
|
||||
**WHAT NOT TO DO:**
|
||||
- ❌ "To create a React app, you should run: mkdir myapp" (describing)
|
||||
- ❌ "I cannot run commands, I am an AI" (refusing)
|
||||
- ❌ Creating only README instead of full project (incomplete)
|
||||
- ❌ "First do X, then do Y" (giving instructions instead of doing)
|
||||
|
||||
**CORRECT BEHAVIOR:**
|
||||
- ✅ Execute the command immediately using the bash tool
|
||||
- ✅ Create all files using the write tool
|
||||
- ✅ Continue until task is 100% complete
|
||||
- ✅ Use ONE tool at a time and wait for results"""
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
1. Test with React Hello World request
|
||||
2. Verify model uses bash to create directory structure
|
||||
3. Verify model uses write to create all files
|
||||
4. Verify no "I cannot" responses
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
If new instructions cause issues:
|
||||
1. Revert to previous ~125 token version
|
||||
2. Analyze what specifically failed
|
||||
3. Iterate on smaller changes
|
||||
|
||||
## Success Metrics
|
||||
|
||||
- [ ] Model uses tools on first request (not after prompting)
|
||||
- [ ] Zero "I cannot" or "I am an AI" responses
|
||||
- [ ] Multi-file projects fully created
|
||||
- [ ] Commands executed, not described
|
||||
@@ -0,0 +1,151 @@
|
||||
# Design Decision: Task Planning and Verification Workflow
|
||||
|
||||
**Date:** 2024-02-24
|
||||
**Scope:** src/api/routes.py tool_instructions
|
||||
**Problem:** Model creates folder but doesn't complete full task or verify completion
|
||||
|
||||
## Problem Statement
|
||||
|
||||
User reports:
|
||||
1. "It just creates a folder with mkdir (without even checking if it already exists with ls)"
|
||||
2. No verification that tasks are completed
|
||||
3. No planning of full task scope
|
||||
4. Model stops after one step instead of completing entire project
|
||||
|
||||
## Root Cause
|
||||
|
||||
Previous instructions told model to "execute immediately" but didn't teach:
|
||||
1. **Planning** - What needs to be done
|
||||
2. **Checking** - What already exists
|
||||
3. **Verification** - Did the step work
|
||||
4. **Completion loop** - Keep going until done
|
||||
|
||||
## Solution
|
||||
|
||||
Add **Task Completion Workflow** to instructions:
|
||||
|
||||
```
|
||||
**TASK COMPLETION WORKFLOW (MANDATORY):**
|
||||
|
||||
**1. PLAN:** List ALL steps needed before starting
|
||||
**2. CHECK:** Use ls to verify what exists before creating
|
||||
**3. EXECUTE:** Run first step
|
||||
**4. VERIFY:** Confirm step worked (ls, read file)
|
||||
**5. REPEAT:** Steps 3-4 until ALL complete
|
||||
**6. FINAL CHECK:** Verify entire task is done
|
||||
**7. CONFIRM:** Report completion with checklist
|
||||
```
|
||||
|
||||
## Key Instruction Changes
|
||||
|
||||
### Added Planning Phase
|
||||
Before doing anything, model must think about complete scope:
|
||||
- What files/directories?
|
||||
- What dependencies?
|
||||
- Complete task requirements
|
||||
|
||||
### Added Verification Steps
|
||||
Every step must be verified:
|
||||
- `ls -la` after mkdir
|
||||
- `read` file after write
|
||||
- Check content is correct
|
||||
|
||||
### Added Completion Loop
|
||||
Model must continue until:
|
||||
✓ All directories exist
|
||||
✓ All files exist with correct content
|
||||
✓ All dependencies installed
|
||||
✓ Each component verified
|
||||
|
||||
### Complete Working Example
|
||||
Provided 13-step React example showing:
|
||||
1. Check existing (ls)
|
||||
2. Create directory
|
||||
3. Verify created (ls)
|
||||
4. Create package.json
|
||||
5. Verify package.json (read)
|
||||
6. Create source files
|
||||
7. Final verification (find myapp -type f)
|
||||
8. Install dependencies
|
||||
9. Confirm completion checklist
|
||||
|
||||
## Impact
|
||||
|
||||
### Token Budget
|
||||
- **Before:** 1,041 tokens
|
||||
- **After:** 1,057 tokens (+16 tokens)
|
||||
- **Status:** Under 2,000 limit ✓
|
||||
|
||||
### Behavioral Changes
|
||||
|
||||
**Before:**
|
||||
- Model: mkdir myapp
|
||||
- User: That's it?
|
||||
- Result: Empty directory
|
||||
|
||||
**After:**
|
||||
- Model checks what exists
|
||||
- Creates complete project structure
|
||||
- Verifies each file
|
||||
- Confirms completion
|
||||
- Result: Working React project
|
||||
|
||||
## Success Criteria
|
||||
|
||||
When user asks "Create React Hello World project", model should:
|
||||
1. ✓ Check current directory contents
|
||||
2. ✓ Create myapp/ directory
|
||||
3. ✓ Verify directory created
|
||||
4. ✓ Create package.json
|
||||
5. ✓ Verify package.json content
|
||||
6. ✓ Create src/App.js
|
||||
7. ✓ Create src/index.js
|
||||
8. ✓ Create public/index.html
|
||||
9. ✓ Final verification (list all files)
|
||||
10. ✓ npm install
|
||||
11. ✓ Confirm completion checklist
|
||||
|
||||
## Testing
|
||||
|
||||
Test instructions contain:
|
||||
- PLAN/CHECK keywords
|
||||
- VERIFY keyword
|
||||
- COMPLETE keyword
|
||||
|
||||
All tests pass: 11/11 ✓
|
||||
|
||||
## Trade-offs
|
||||
|
||||
**Pros:**
|
||||
- Complete task execution
|
||||
- Verification prevents partial work
|
||||
- Clear completion criteria
|
||||
- Better user experience
|
||||
|
||||
**Cons:**
|
||||
- More tokens (but still under limit)
|
||||
- More verbose instructions
|
||||
- May be slower (more verification steps)
|
||||
|
||||
## Related Files Changed
|
||||
|
||||
1. src/api/routes.py - Updated tool_instructions
|
||||
2. tests/test_tool_parsing.py - Updated tests for new content
|
||||
3. docs/design/2024-02-24-task-planning-verification.md - This doc
|
||||
|
||||
## Future Improvements
|
||||
|
||||
1. **Task Queue System:** Server-side queue of pending operations
|
||||
2. **State Persistence:** Remember what's been done across conversations
|
||||
3. **Smart Resumption:** If interrupted, pick up where left off
|
||||
4. **Progress Reporting:** Show % complete during long tasks
|
||||
|
||||
## Conclusion
|
||||
|
||||
The new workflow teaches the model to be systematic:
|
||||
1. Plan before acting
|
||||
2. Check before creating
|
||||
3. Verify after each step
|
||||
4. Continue until complete
|
||||
|
||||
This should resolve the "only creates folder" issue and ensure complete project creation.
|
||||
@@ -0,0 +1,132 @@
|
||||
# Design Decision: Tool Parsing Simplification
|
||||
|
||||
**Date:** 2024-02-24
|
||||
**Scope:** src/api/routes.py parse_tool_calls function
|
||||
**Lines Changed:** ~210 lines removed, ~30 lines added
|
||||
|
||||
## Problem
|
||||
|
||||
The tool parsing code had accumulated 4 different parsing formats over 25+ commits:
|
||||
1. JSON `tool_calls` format with nested objects
|
||||
2. TOOL:/ARGUMENTS: format (simple text)
|
||||
3. Function pattern format `func_name(args)`
|
||||
4. Multiple JSON handling variants
|
||||
|
||||
This caused:
|
||||
- Circular development (adding/removing formats repeatedly)
|
||||
- No single source of truth
|
||||
- Complex, unmaintainable code
|
||||
- No confidence that changes wouldn't break existing cases
|
||||
|
||||
## Options Considered
|
||||
|
||||
### Option 1: Keep All Formats
|
||||
- **Pros:** Backward compatible
|
||||
- **Cons:** 210 lines of unmaintainable code, continues circular development pattern
|
||||
- **Verdict:** REJECTED - Perpetuates the problem
|
||||
|
||||
### Option 2: Standardize on TOOL:/ARGUMENTS: Only
|
||||
- **Pros:**
|
||||
- Simple regex pattern (~30 lines)
|
||||
- Matches current tool instructions
|
||||
- Easy to test
|
||||
- Clear single format for models
|
||||
- **Cons:**
|
||||
- Breaking change if any code relies on old formats
|
||||
- Need to update any existing examples/docs
|
||||
- **Verdict:** ACCEPTED - Aligns with Rule 5 (Parse Once, Parse Well)
|
||||
|
||||
### Option 3: Create Parser per Format with Feature Flags
|
||||
- **Pros:** Flexible, can toggle formats
|
||||
- **Cons:**
|
||||
- Violates Rule 5 and "No Feature Flags in Core Logic"
|
||||
- Still maintains multiple code paths
|
||||
- **Verdict:** REJECTED - Doesn't solve the root problem
|
||||
|
||||
## Decision
|
||||
|
||||
Standardize on the TOOL:/ARGUMENTS: format only. Remove all other parsing code.
|
||||
|
||||
**Rationale:**
|
||||
- Per DEVELOPMENT_PATTERNS.md recommendation #3: "One Format Only"
|
||||
- Token cost is minimal (no complex regex)
|
||||
- Test coverage provides confidence
|
||||
- Aligns with existing tool instructions
|
||||
|
||||
## Impact
|
||||
|
||||
### Token Count
|
||||
- **Parser code:** 210 lines → 30 lines (-180 lines)
|
||||
- **No change** to tool instructions (separate optimization)
|
||||
|
||||
### Breaking Changes
|
||||
- **Yes** - Removes support for:
|
||||
- JSON `tool_calls` format in model responses
|
||||
- Function pattern format `read_file(path="test.txt")`
|
||||
|
||||
**Migration:** Models must use:
|
||||
```
|
||||
TOOL: read
|
||||
ARGUMENTS: {"filePath": "test.txt"}
|
||||
```
|
||||
|
||||
### Testing
|
||||
- Unit tests added: 9 test cases
|
||||
- Coverage: All parsing scenarios
|
||||
- All tests pass
|
||||
|
||||
## Implementation
|
||||
|
||||
```python
|
||||
# New implementation (30 lines)
|
||||
def parse_tool_calls(text: str) -> tuple:
|
||||
"""Parse tool calls using standardized format."""
|
||||
import json
|
||||
import re
|
||||
|
||||
tool_pattern = r'TOOL:\s*(\w+)\s*\nARGUMENTS:\s*(\{[^}]*\})'
|
||||
tool_matches = list(re.finditer(tool_pattern, text, re.IGNORECASE))
|
||||
|
||||
if not tool_matches:
|
||||
return text, None
|
||||
|
||||
tool_calls = []
|
||||
for i, tool_match in enumerate(tool_matches):
|
||||
tool_name = tool_match.group(1)
|
||||
args_str = tool_match.group(2)
|
||||
try:
|
||||
args_dict = json.loads(args_str)
|
||||
tool_calls.append({
|
||||
"id": f"call_{i+1}",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": tool_name,
|
||||
"arguments": json.dumps(args_dict)
|
||||
}
|
||||
})
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
|
||||
if not tool_calls:
|
||||
return text, None
|
||||
|
||||
first_start = tool_matches[0].start()
|
||||
content = text[:first_start].strip()
|
||||
|
||||
return content, tool_calls
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
Run tests:
|
||||
```bash
|
||||
python tests/test_tool_parsing.py
|
||||
```
|
||||
|
||||
Expected: 9 passed, 0 failed
|
||||
|
||||
## Follow-up
|
||||
|
||||
- [x] Update DEVELOPMENT_PATTERNS.md to mark as completed
|
||||
- [x] Add unit tests
|
||||
- [ ] Consider integration test for full tool execution flow
|
||||
@@ -0,0 +1,112 @@
|
||||
# Test Plan: Fix Tool Execution and Token Reporting
|
||||
|
||||
## Problem Analysis
|
||||
|
||||
### Issue 1: Model Gives Instructions Instead of Executing
|
||||
**Current behavior:** Model describes what to do ("You should run mkdir...") instead of using TOOL: format
|
||||
**Expected:** Model responds with TOOL: bash\nARGUMENTS: {"command": "mkdir..."}
|
||||
|
||||
### Issue 2: Token Counting Inaccurate
|
||||
**Current:** Rough estimate `len(prompt) // 4`
|
||||
**Expected:** Accurate token count using tiktoken
|
||||
**Impact:** opencode can't properly manage context window
|
||||
|
||||
### Issue 3: npx Commands Timeout/Need Input
|
||||
**Current:** `npx create-react-app .` prompts for confirmation (y/n)
|
||||
**Expected:** Non-interactive execution or manual file creation
|
||||
**Evidence:** "Need to install the following packages: create-react-app@5.1.0 Ok to proceed? (y)"
|
||||
|
||||
## Unit Tests
|
||||
|
||||
### Test 1: Accurate Token Counting
|
||||
- [ ] Verify token count uses tiktoken (not rough estimate)
|
||||
- [ ] Test with known token counts
|
||||
- [ ] Verify prompt_tokens + completion_tokens = total_tokens
|
||||
|
||||
### Test 2: Non-Interactive Bash Commands
|
||||
- [ ] Verify npm/npx commands use --yes or equivalent flags
|
||||
- [ ] Test timeout handling for package managers
|
||||
- [ ] Verify commands don't prompt for user input
|
||||
|
||||
### Test 3: Tool Instructions Content
|
||||
- [ ] Verify instructions emphasize "EXECUTE not DESCRIBE"
|
||||
- [ ] Verify manual file creation examples (not npx)
|
||||
- [ ] Verify anti-patterns are clearly stated
|
||||
|
||||
## Integration Tests
|
||||
|
||||
### Test 4: End-to-End React Project Creation
|
||||
**Input:** "Create a React Hello World app"
|
||||
|
||||
**Expected Flow:**
|
||||
1. TOOL: bash, ARGUMENTS: {"command": "mkdir myapp"}
|
||||
2. TOOL: write, ARGUMENTS: {"filePath": "myapp/package.json", "content": "..."}
|
||||
3. TOOL: write, ARGUMENTS: {"filePath": "myapp/src/App.js", "content": "..."}
|
||||
4. Continue until complete
|
||||
|
||||
**Failure Modes:**
|
||||
- [ ] Model describes steps instead of executing
|
||||
- [ ] Uses npx create-react-app (should manually create files)
|
||||
- [ ] Stops after README only
|
||||
|
||||
### Test 5: Token Reporting Accuracy
|
||||
**Input:** Any chat completion request
|
||||
|
||||
**Expected:**
|
||||
- usage.prompt_tokens matches actual tokens
|
||||
- usage.completion_tokens matches actual tokens
|
||||
- usage.total_tokens is sum
|
||||
|
||||
**Verification:**
|
||||
- Compare tiktoken count vs API response
|
||||
|
||||
## Manual Verification
|
||||
|
||||
```bash
|
||||
# Test React creation
|
||||
python main.py --auto &
|
||||
curl -X POST http://localhost:17615/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-Client-Working-Dir: /tmp/test-project" \
|
||||
-d '{
|
||||
"model": "local-swarm",
|
||||
"messages": [{"role": "user", "content": "Create a React Hello World app"}],
|
||||
"tools": [{"type": "function", "function": {"name": "bash"}}, {"type": "function", "function": {"name": "write"}}]
|
||||
}'
|
||||
|
||||
# Check token accuracy
|
||||
curl -X POST http://localhost:17615/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "local-swarm",
|
||||
"messages": [{"role": "user", "content": "Hello"}]
|
||||
}' | jq '.usage'
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. **Execution:** 100% of requests use TOOL: format (not descriptions)
|
||||
2. **Accuracy:** Token counts match tiktoken within ±5%
|
||||
3. **Completion:** Multi-file projects fully created via write tool
|
||||
4. **No npx:** Manual file creation for React (no npx create-react-app)
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Token Counting Fix
|
||||
```python
|
||||
# Replace: prompt_tokens = len(prompt) // 4
|
||||
# With:
|
||||
import tiktoken
|
||||
encoding = tiktoken.get_encoding('cl100k_base')
|
||||
prompt_tokens = len(encoding.encode(prompt))
|
||||
completion_tokens = len(encoding.encode(content))
|
||||
```
|
||||
|
||||
### Tool Instructions Fix
|
||||
- Add explicit "DO NOT USE npx create-react-app" instruction
|
||||
- Add "EXECUTE IMMEDIATELY" mandate
|
||||
- Show complete React example with manual file creation
|
||||
|
||||
### Non-Interactive Commands
|
||||
- Auto-add --yes to npx commands
|
||||
- Or recommend manual file creation instead
|
||||
@@ -0,0 +1,97 @@
|
||||
# Test Plan: Improved Tool Instructions
|
||||
|
||||
## Problem Statement
|
||||
Model is not using tools effectively:
|
||||
1. Creates README instead of actual project structure
|
||||
2. Provides commands as text instead of executing them
|
||||
3. Refuses to run commands claiming "I am only an AI assistant"
|
||||
|
||||
## Root Cause Analysis
|
||||
Current instructions don't clearly communicate:
|
||||
- That the model SHOULD use tools proactively
|
||||
- That execution is expected, not explanation
|
||||
- The workflow: user request → tool execution → result
|
||||
|
||||
## Unit Tests (Instruction Verification)
|
||||
|
||||
### Test 1: Instruction Presence
|
||||
- [ ] Verify instructions are injected into system message
|
||||
- [ ] Verify instructions appear at the START of system message (priority position)
|
||||
|
||||
### Test 2: Token Count
|
||||
- [ ] Measure total token count of new instructions
|
||||
- [ ] Verify ≤ 500 tokens (conservative budget)
|
||||
- [ ] Document before/after
|
||||
|
||||
### Test 3: Format Compliance
|
||||
- [ ] Verify instructions include TOOL:/ARGUMENTS: format
|
||||
- [ ] Verify examples use correct format
|
||||
- [ ] Verify rules are clear and numbered
|
||||
|
||||
## Integration Tests (Behavioral)
|
||||
|
||||
### Test 4: Project Creation Flow
|
||||
**Input:** "Create a React Hello World app"
|
||||
|
||||
**Expected Behavior:**
|
||||
1. Model responds with TOOL: bash, ARGUMENTS: mkdir myapp
|
||||
2. After result, TOOL: write, ARGUMENTS: package.json content
|
||||
3. After result, TOOL: write, ARGUMENTS: src/App.js content
|
||||
4. Continue until complete project structure exists
|
||||
|
||||
**Failure Modes:**
|
||||
- [ ] Model only describes what to do
|
||||
- [ ] Model creates README only
|
||||
- [ ] Model refuses to execute commands
|
||||
|
||||
### Test 5: Multi-step Task
|
||||
**Input:** "Check what files exist, then create a test.txt file with 'hello' in it"
|
||||
|
||||
**Expected Behavior:**
|
||||
1. TOOL: bash, ARGUMENTS: ls -la
|
||||
2. Wait for result
|
||||
3. TOOL: write, ARGUMENTS: test.txt with "hello"
|
||||
|
||||
**Failure Modes:**
|
||||
- [ ] Model tries to do both in one response
|
||||
- [ ] Model doesn't wait for ls result before writing
|
||||
|
||||
### Test 6: Command Refusal
|
||||
**Input:** "Run npm install"
|
||||
|
||||
**Expected Behavior:**
|
||||
1. TOOL: bash, ARGUMENTS: npm install
|
||||
|
||||
**Failure Modes:**
|
||||
- [ ] Model responds: "I cannot run commands, I am only an AI assistant"
|
||||
- [ ] Model explains npm install instead of running it
|
||||
|
||||
## Manual Verification Commands
|
||||
|
||||
```bash
|
||||
# Start the server
|
||||
python main.py --auto
|
||||
|
||||
# In another terminal, test with curl
|
||||
curl -X POST http://localhost:17615/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "local-swarm",
|
||||
"messages": [{"role": "user", "content": "Create a React Hello World app"}],
|
||||
"tools": [{"type": "function", "function": {"name": "bash", "description": "Run shell commands"}}, {"type": "function", "function": {"name": "write", "description": "Write files"}}]
|
||||
}'
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. **Proactivity:** Model uses tools without being asked twice
|
||||
2. **Execution:** Model runs commands, doesn't just describe them
|
||||
3. **No Refusal:** Model never says "I cannot" or "I am only an AI"
|
||||
4. **Completeness:** Multi-file projects are fully created via tools
|
||||
5. **Format:** 100% of tool calls use correct TOOL:/ARGUMENTS: format
|
||||
|
||||
## Metrics
|
||||
|
||||
- **Tool usage rate:** % of requests that result in tool calls
|
||||
- **Format compliance:** % of tool calls in correct format
|
||||
- **Completion rate:** % of multi-step tasks fully completed
|
||||
@@ -0,0 +1,35 @@
|
||||
# Test Plan: Tool Parsing Simplification
|
||||
|
||||
## Unit Tests
|
||||
|
||||
- [x] Test case 1: Single tool call → Returns 1 tool with correct name and arguments
|
||||
- [x] Test case 2: No tool in text → Returns None for tools, original text as content
|
||||
- [x] Test case 3: Multiple tools → Returns all tools in order
|
||||
- [x] Test case 4: Content before tool → Content extracted, tool parsed correctly
|
||||
- [x] Test case 5: Bash tool → Correctly parses bash command
|
||||
- [x] Test case 6: Case insensitive → "tool:" and "TOOL:" both work
|
||||
- [x] Test case 7: Invalid JSON → Skips invalid, continues with valid
|
||||
- [x] Test case 8: Empty text → Returns None, empty string
|
||||
- [x] Test case 9: Whitespace only → Returns None
|
||||
|
||||
## Integration Tests
|
||||
|
||||
- [ ] End-to-end flow:
|
||||
1. Send chat completion request with tools
|
||||
2. Model responds with TOOL:/ARGUMENTS: format
|
||||
3. Parser extracts tool call
|
||||
4. Tool executes
|
||||
5. Result returned in response
|
||||
|
||||
- [ ] Expected result: Tool executes successfully, result included in response
|
||||
|
||||
## Manual Verification
|
||||
|
||||
- [ ] Command: `python tests/test_tool_parsing.py`
|
||||
- [ ] Expected output: "9 passed, 0 failed"
|
||||
|
||||
## Token Budget Verification
|
||||
|
||||
- Parser code: ~30 lines (~200 tokens)
|
||||
- Well under 2000 token limit
|
||||
- Simple regex pattern maintains low complexity
|
||||
@@ -45,6 +45,10 @@ from interactive import (
|
||||
)
|
||||
from network import create_discovery_service, FederatedSwarm
|
||||
from tools.executor import ToolExecutor, set_tool_executor
|
||||
from utils.logging_config import setup_logging
|
||||
|
||||
# Set up logging (DEBUG level for development)
|
||||
setup_logging()
|
||||
|
||||
|
||||
async def setup_swarm(model_config, hardware):
|
||||
|
||||
@@ -4,6 +4,7 @@ pyyaml>=6.0
|
||||
requests>=2.31.0
|
||||
tqdm>=4.65.0
|
||||
psutil>=5.9.0
|
||||
tiktoken>=0.5.0
|
||||
|
||||
# API server
|
||||
fastapi>=0.104.0
|
||||
|
||||
@@ -0,0 +1,34 @@
|
||||
#!/usr/bin/env python3
|
||||
import re
|
||||
|
||||
# Read the file
|
||||
with open('src/api/routes.py', 'r') as f:
|
||||
lines = f.readlines()
|
||||
|
||||
# Find the line with 'logger = logging.getLogger(__name__)'
|
||||
has_logger = any('logger = logging.getLogger(__name__)' in line for line in lines)
|
||||
|
||||
if not has_logger:
|
||||
# Find where to insert (after TOKEN_ENCODING line)
|
||||
for i, line in enumerate(lines):
|
||||
if 'TOKEN_ENCODING = tiktoken.get_encoding' in line:
|
||||
lines.insert(i + 1, '\n')
|
||||
lines.insert(i + 2, '# Set up logger\n')
|
||||
lines.insert(i + 3, 'logger = logging.getLogger(__name__)\n')
|
||||
break
|
||||
|
||||
# Replace print statements
|
||||
new_lines = []
|
||||
for line in lines:
|
||||
# Replace print(f"...) with logger.debug(f"...")
|
||||
if 'print(f"' in line and not line.strip().startswith('#'):
|
||||
line = line.replace('print(f"', 'logger.debug(f"')
|
||||
elif 'print(f\'' in line and not line.strip().startswith('#'):
|
||||
line = line.replace('print(f\'', 'logger.debug(f\'')
|
||||
new_lines.append(line)
|
||||
|
||||
# Write back
|
||||
with open('src/api/routes.py', 'w') as f:
|
||||
f.writelines(new_lines)
|
||||
|
||||
print('Done! Replaced print statements with logger.debug')
|
||||
@@ -0,0 +1,44 @@
|
||||
#!/usr/bin/env python3
|
||||
import re
|
||||
import sys
|
||||
|
||||
filepath = sys.argv[1]
|
||||
|
||||
# Read the file
|
||||
with open(filepath, 'r') as f:
|
||||
lines = f.readlines()
|
||||
|
||||
# Find the line with 'logger = logging.getLogger(__name__)'
|
||||
has_logger = any('logger = logging.getLogger(__name__)' in line for line in lines)
|
||||
has_logging_import = any('import logging' in line for line in lines)
|
||||
|
||||
if not has_logging_import:
|
||||
# Find where to insert import
|
||||
for i, line in enumerate(lines):
|
||||
if line.startswith('import ') or line.startswith('from '):
|
||||
lines.insert(i, 'import logging\n')
|
||||
break
|
||||
|
||||
if not has_logger:
|
||||
# Find where to insert logger (after imports)
|
||||
for i, line in enumerate(lines):
|
||||
if line.startswith('class ') or line.startswith('def '):
|
||||
lines.insert(i, '\n')
|
||||
lines.insert(i + 1, 'logger = logging.getLogger(__name__)\n')
|
||||
break
|
||||
|
||||
# Replace print statements
|
||||
new_lines = []
|
||||
for line in lines:
|
||||
# Replace print(f"...) with logger.debug(f"...")
|
||||
if 'print(f"' in line and not line.strip().startswith('#'):
|
||||
line = line.replace('print(f"', 'logger.debug(f"')
|
||||
elif 'print(f\'' in line and not line.strip().startswith('#'):
|
||||
line = line.replace('print(f\'', 'logger.debug(f\'')
|
||||
new_lines.append(line)
|
||||
|
||||
# Write back
|
||||
with open(filepath, 'w') as f:
|
||||
f.writelines(new_lines)
|
||||
|
||||
print(f'Done! Fixed logging in {filepath}')
|
||||
@@ -0,0 +1,87 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Script to replace print statements with logging in Python files."""
|
||||
|
||||
import re
|
||||
import sys
|
||||
|
||||
def replace_prints_in_file(filepath):
|
||||
"""Replace print statements with logger calls in a file."""
|
||||
with open(filepath, 'r') as f:
|
||||
content = f.read()
|
||||
|
||||
original_content = content
|
||||
|
||||
# Add logger import if not present
|
||||
if 'logger = logging.getLogger(__name__)' not in content and 'import logging' in content:
|
||||
# Already has logging import but no logger setup
|
||||
pass
|
||||
elif 'import logging' not in content:
|
||||
# Need to add logging import
|
||||
lines = content.split('\n')
|
||||
import_idx = 0
|
||||
for i, line in enumerate(lines):
|
||||
if line.startswith('import ') or line.startswith('from '):
|
||||
import_idx = i + 1
|
||||
lines.insert(import_idx, 'import logging')
|
||||
lines.insert(import_idx + 1, '')
|
||||
lines.insert(import_idx + 2, 'logger = logging.getLogger(__name__)')
|
||||
content = '\n'.join(lines)
|
||||
|
||||
# Replace simple print statements with logger.debug
|
||||
# Pattern: print(f"...")
|
||||
content = re.sub(
|
||||
r'^(\s*)print\(f"([^"]+)"\)',
|
||||
r'\1logger.debug(f"\2")',
|
||||
content,
|
||||
flags=re.MULTILINE
|
||||
)
|
||||
|
||||
# Pattern: print(f'...')
|
||||
content = re.sub(
|
||||
r"^(\s*)print\(f'([^']+)'\)",
|
||||
r'\1logger.debug(f"\2")',
|
||||
content,
|
||||
flags=re.MULTILINE
|
||||
)
|
||||
|
||||
# Pattern: print("...")
|
||||
content = re.sub(
|
||||
r'^(\s*)print\("([^"]+)"\)',
|
||||
r'\1logger.debug("\2")',
|
||||
content,
|
||||
flags=re.MULTILINE
|
||||
)
|
||||
|
||||
# Pattern: print(f"...", end="")
|
||||
content = re.sub(
|
||||
r'^(\s*)print\(f"([^"]+)",\s*end="[^"]*"\)',
|
||||
r'\1logger.debug(f"\2")',
|
||||
content,
|
||||
flags=re.MULTILINE
|
||||
)
|
||||
|
||||
# Pattern: print(f"..." \n f"...") - multiline
|
||||
content = re.sub(
|
||||
r'print\(f"([^"]+)"\s*\n\s*f"',
|
||||
r'logger.debug(f"\1" \n f"',
|
||||
content
|
||||
)
|
||||
|
||||
with open(filepath, 'w') as f:
|
||||
f.write(content)
|
||||
|
||||
# Count changes
|
||||
changes = content.count('logger.debug') - original_content.count('logger.debug')
|
||||
if changes > 0:
|
||||
print(f"Replaced ~{changes} print statements in {filepath}")
|
||||
|
||||
return changes
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: python replace_prints.py <filepath>")
|
||||
sys.exit(1)
|
||||
|
||||
filepath = sys.argv[1]
|
||||
replace_prints_in_file(filepath)
|
||||
+2
-1
@@ -91,7 +91,7 @@ class ChatCompletionResponse(BaseModel):
|
||||
class ChatCompletionStreamChoice(BaseModel):
|
||||
"""A choice in streaming response."""
|
||||
index: int = Field(default=0, description="Choice index")
|
||||
delta: Dict[str, str] = Field(..., description="Content delta")
|
||||
delta: Dict[str, Any] = Field(..., description="Content delta (can include 'content', 'tool_calls', etc.)")
|
||||
finish_reason: Optional[str] = Field(default=None, description="Reason for finishing")
|
||||
|
||||
|
||||
@@ -102,6 +102,7 @@ class ChatCompletionStreamResponse(BaseModel):
|
||||
created: int = Field(..., description="Unix timestamp")
|
||||
model: str = Field(..., description="Model used")
|
||||
choices: List[ChatCompletionStreamChoice] = Field(..., description="Content chunks")
|
||||
usage: Optional[UsageInfo] = Field(default=None, description="Token usage (only in final chunk)")
|
||||
|
||||
|
||||
class ModelInfo(BaseModel):
|
||||
|
||||
+538
-309
File diff suppressed because it is too large
Load Diff
+25
-9
@@ -153,28 +153,28 @@ MLX_QUALITY_MAP = {
|
||||
MODEL_METADATA = {
|
||||
"qwen2.5-coder": {
|
||||
"name": "Qwen 2.5 Coder",
|
||||
"description": "Alibaba's code-focused model, excellent for small sizes",
|
||||
"description": "Alibaba's code-focused Instruct model, excellent for small sizes",
|
||||
"priority": 1,
|
||||
"max_context": 128000,
|
||||
"variants": ["3b", "7b", "14b"],
|
||||
},
|
||||
"deepseek-coder": {
|
||||
"name": "DeepSeek Coder",
|
||||
"description": "DeepSeek's code model, good alternative",
|
||||
"description": "DeepSeek's code model (Instruct variant)",
|
||||
"priority": 2,
|
||||
"max_context": 16384,
|
||||
"variants": ["1.3b", "6.7b"],
|
||||
},
|
||||
"codellama": {
|
||||
"name": "CodeLlama",
|
||||
"description": "Meta's code model",
|
||||
"description": "Meta's code model (Instruct variant)",
|
||||
"priority": 3,
|
||||
"max_context": 16384,
|
||||
"variants": ["7b", "13b"],
|
||||
},
|
||||
"llama-3.2": {
|
||||
"name": "Llama 3.2",
|
||||
"description": "Meta's latest general-purpose model with strong coding abilities",
|
||||
"description": "Meta's latest general-purpose model with strong coding abilities (Instruct variant)",
|
||||
"priority": 4,
|
||||
"max_context": 128000,
|
||||
"variants": ["1b", "3b"],
|
||||
@@ -195,10 +195,10 @@ MODEL_METADATA = {
|
||||
},
|
||||
"starcoder2": {
|
||||
"name": "StarCoder2",
|
||||
"description": "BigCode's open code generation model",
|
||||
"description": "BigCode's open code generation model (Instruct variant)",
|
||||
"priority": 7,
|
||||
"max_context": 8192,
|
||||
"variants": ["3b", "7b", "15b"],
|
||||
"variants": ["15b"], # Only 15b has Instruct variant on MLX
|
||||
},
|
||||
}
|
||||
|
||||
@@ -351,22 +351,38 @@ def get_model_hf_repo(model_id: str, variant: ModelVariant, quant: QuantizationC
|
||||
|
||||
def get_model_hf_repo_mlx(model_id: str, variant: ModelVariant, quant: QuantizationConfig) -> str:
|
||||
"""Get the HuggingFace repository path for MLX quantized models (Apple Silicon)."""
|
||||
# Map GGUF quantization names to MLX quantization names
|
||||
# MLX uses simple names: 3bit, 4bit, 8bit, not q4_k_m, q6_k, etc.
|
||||
gguf_to_mlx_quant = {
|
||||
"q3_k_m": "3bit",
|
||||
"q4_k_m": "4bit",
|
||||
"q4_k": "4bit",
|
||||
"q5_k_m": "5bit",
|
||||
"q5_k": "5bit",
|
||||
"q6_k": "6bit",
|
||||
"q8_0": "8bit",
|
||||
"q8": "8bit",
|
||||
}
|
||||
|
||||
# MLX quantized models are in mlx-community org with -{quant}bit suffix
|
||||
# Map base model names to mlx-community quantized versions
|
||||
# IMPORTANT: Always use Instruct variants for instruction-following
|
||||
mlx_repo_map = {
|
||||
"qwen2.5-coder": f"mlx-community/Qwen2.5-Coder-{variant.size.capitalize()}-Instruct",
|
||||
"deepseek-coder": f"mlx-community/deepseek-coder-{variant.size}-base",
|
||||
"deepseek-coder": f"mlx-community/deepseek-coder-{variant.size}-instruct-mlx",
|
||||
"codellama": f"mlx-community/CodeLlama-{variant.size}-Instruct",
|
||||
"llama-3.2": f"mlx-community/Llama-3.2-{variant.size}-Instruct",
|
||||
"phi-4": f"mlx-community/phi-4",
|
||||
"gemma-2": f"mlx-community/gemma-2-{variant.size}-it",
|
||||
"starcoder2": f"mlx-community/starcoder2-{variant.size}",
|
||||
"starcoder2": f"mlx-community/starcoder2-{variant.size}-instruct-v0.1",
|
||||
}
|
||||
|
||||
base_repo = mlx_repo_map.get(model_id, "")
|
||||
if base_repo and quant:
|
||||
# Convert GGUF quant name to MLX quant name
|
||||
mlx_quant = gguf_to_mlx_quant.get(quant.name, quant.name)
|
||||
# Append quantization suffix
|
||||
return f"{base_repo}-{quant.name}"
|
||||
return f"{base_repo}-{mlx_quant}"
|
||||
return base_repo
|
||||
|
||||
|
||||
|
||||
+187
-23
@@ -5,12 +5,15 @@ Remote execution allows a single "tool host" to manage the workspace
|
||||
while workers perform distributed generation.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
import subprocess
|
||||
import aiohttp
|
||||
from typing import Optional
|
||||
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
class ToolExecutor:
|
||||
"""Executes tools either locally or remotely via a tool host."""
|
||||
|
||||
@@ -52,7 +55,7 @@ class ToolExecutor:
|
||||
async def _execute_remote(self, tool_name: str, tool_args: dict) -> str:
|
||||
"""Execute tool on remote tool host."""
|
||||
try:
|
||||
print(f" 🔧 Remote tool call: {tool_name}({tool_args})")
|
||||
logger.debug(f" 🔧 Remote tool call: {tool_name}({tool_args})")
|
||||
session = await self._get_session()
|
||||
url = f"{self.tool_host_url}/v1/tools/execute"
|
||||
|
||||
@@ -61,21 +64,50 @@ class ToolExecutor:
|
||||
"arguments": tool_args
|
||||
}
|
||||
|
||||
# If working_dir is specified in tool_args, preserve it for remote execution
|
||||
# The remote tool server will extract and use it
|
||||
if 'working_dir' in tool_args:
|
||||
logger.debug(f" 📍 Remote working_dir: {tool_args['working_dir']}")
|
||||
|
||||
async with session.post(url, json=payload) as resp:
|
||||
if resp.status == 200:
|
||||
data = await resp.json()
|
||||
result = data.get("result", "No result from tool host")
|
||||
print(f" ✅ Tool result received ({len(result)} chars)")
|
||||
logger.debug(f" ✅ Tool result received ({len(result)} chars)")
|
||||
return result
|
||||
else:
|
||||
error_text = await resp.text()
|
||||
print(f" ❌ Tool host error: {resp.status}")
|
||||
logger.debug(f" ❌ Tool host error: {resp.status}")
|
||||
return f"Tool host error ({resp.status}): {error_text}"
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ Error contacting tool host: {e}")
|
||||
logger.debug(f" ❌ Error contacting tool host: {e}")
|
||||
return f"Error contacting tool host: {str(e)}"
|
||||
|
||||
|
||||
def _discover_project_root(self, start_dir: Optional[str] = None) -> str:
|
||||
"""Discover the project root directory by looking for common markers."""
|
||||
import os
|
||||
if start_dir is None:
|
||||
start_dir = os.getcwd()
|
||||
current = os.path.abspath(start_dir)
|
||||
|
||||
# Common project root markers
|
||||
markers = ['.git', 'package.json', 'pyproject.toml', 'Cargo.toml', 'go.mod',
|
||||
'requirements.txt', 'setup.py', 'pom.xml', 'build.gradle', '.project', '.venv']
|
||||
|
||||
while True:
|
||||
try:
|
||||
if any(os.path.exists(os.path.join(current, marker)) for marker in markers):
|
||||
return current
|
||||
except Exception:
|
||||
pass # Permission errors, just skip
|
||||
parent = os.path.dirname(current)
|
||||
if parent == current: # Reached filesystem root
|
||||
break
|
||||
current = parent
|
||||
|
||||
return start_dir
|
||||
|
||||
async def _execute_local(self, tool_name: str, tool_args: dict) -> str:
|
||||
"""Execute tool locally."""
|
||||
try:
|
||||
@@ -102,6 +134,8 @@ class ToolExecutor:
|
||||
async def _execute_read(self, args: dict) -> str:
|
||||
"""Execute read tool."""
|
||||
file_path = args.get("filePath", "")
|
||||
working_dir = args.get("working_dir", os.getcwd()) # Optional: override cwd
|
||||
|
||||
if not file_path:
|
||||
return "Error: filePath required"
|
||||
|
||||
@@ -110,17 +144,39 @@ class ToolExecutor:
|
||||
if file_path.startswith("..") or file_path.startswith("/.."):
|
||||
return "Error: Directory traversal not allowed"
|
||||
|
||||
if os.path.exists(file_path):
|
||||
with open(file_path, 'r') as f:
|
||||
content = f.read()
|
||||
return f"File contents ({len(content)} chars):\n{content[:3000]}" # Limit output
|
||||
# Resolve path relative to working_dir if not absolute
|
||||
if not os.path.isabs(file_path):
|
||||
full_path = os.path.join(working_dir, file_path)
|
||||
else:
|
||||
return f"Error: File '{file_path}' not found"
|
||||
full_path = file_path
|
||||
|
||||
# Additional security: ensure resolved path is within working_dir
|
||||
try:
|
||||
real_working_dir = os.path.realpath(working_dir)
|
||||
real_full_path = os.path.realpath(full_path)
|
||||
if not real_full_path.startswith(real_working_dir):
|
||||
return f"Error: Access denied - path outside working directory"
|
||||
except Exception:
|
||||
pass # If realpath fails, continue anyway
|
||||
|
||||
logger.debug(f" 📁 Reading: {file_path}")
|
||||
logger.debug(f" 📍 Working dir: {working_dir}")
|
||||
logger.debug(f" 🔍 Full path: {full_path}")
|
||||
|
||||
if os.path.exists(full_path):
|
||||
with open(full_path, 'r') as f:
|
||||
content = f.read()
|
||||
result = f"File contents ({len(content)} chars):\n{content[:3000]}" # Limit output
|
||||
logger.debug(f" ✓ Read {len(content)} chars")
|
||||
return result
|
||||
else:
|
||||
return f"Error: File '{full_path}' not found"
|
||||
|
||||
async def _execute_write(self, args: dict) -> str:
|
||||
"""Execute write tool."""
|
||||
file_path = args.get("filePath", "")
|
||||
content = args.get("content", "")
|
||||
working_dir = args.get("working_dir", os.getcwd()) # Optional: override cwd
|
||||
|
||||
if not file_path:
|
||||
return "Error: filePath required"
|
||||
@@ -130,19 +186,42 @@ class ToolExecutor:
|
||||
if file_path.startswith("..") or file_path.startswith("/.."):
|
||||
return "Error: Directory traversal not allowed"
|
||||
|
||||
# Resolve path relative to working_dir if not absolute
|
||||
if not os.path.isabs(file_path):
|
||||
full_path = os.path.join(working_dir, file_path)
|
||||
else:
|
||||
full_path = file_path
|
||||
|
||||
# Additional security: ensure resolved path is within working_dir
|
||||
try:
|
||||
real_working_dir = os.path.realpath(working_dir)
|
||||
real_full_path = os.path.realpath(full_path)
|
||||
if not real_full_path.startswith(real_working_dir):
|
||||
return f"Error: Access denied - path outside working directory"
|
||||
except Exception:
|
||||
pass # If realpath fails, continue anyway
|
||||
|
||||
logger.debug(f" 📁 Writing: {file_path}")
|
||||
logger.debug(f" 📍 Working dir: {working_dir}")
|
||||
logger.debug(f" 🔍 Full path: {full_path}")
|
||||
|
||||
# Create parent directories if needed
|
||||
parent_dir = os.path.dirname(file_path)
|
||||
parent_dir = os.path.dirname(full_path)
|
||||
if parent_dir and not os.path.exists(parent_dir):
|
||||
os.makedirs(parent_dir, exist_ok=True)
|
||||
logger.debug(f" 📁 Created directory: {parent_dir}")
|
||||
|
||||
with open(file_path, 'w') as f:
|
||||
with open(full_path, 'w') as f:
|
||||
f.write(content)
|
||||
|
||||
return f"Successfully wrote {len(content)} characters to {file_path}"
|
||||
result = f"Successfully wrote {len(content)} characters to {full_path}"
|
||||
logger.debug(f" ✓ Write complete")
|
||||
return result
|
||||
|
||||
async def _execute_bash(self, args: dict) -> str:
|
||||
"""Execute bash tool."""
|
||||
command = args.get("command", "")
|
||||
cwd = args.get("cwd", os.getcwd()) # Optional: override cwd
|
||||
|
||||
if not command:
|
||||
return "Error: command required"
|
||||
@@ -153,17 +232,102 @@ class ToolExecutor:
|
||||
if d in command:
|
||||
return f"Error: Dangerous command blocked: {d}"
|
||||
|
||||
result = subprocess.run(
|
||||
command,
|
||||
shell=True,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=30,
|
||||
cwd=os.getcwd()
|
||||
)
|
||||
logger.debug(f" 🖥️ BASH: {command[:80]}{'...' if len(command) > 80 else ''}")
|
||||
logger.debug(f" 📍 Working directory: {cwd}")
|
||||
|
||||
output = result.stdout if result.returncode == 0 else f"Exit code {result.returncode}: {result.stderr}"
|
||||
return output[:3000] # Limit output
|
||||
# Determine timeout based on command type - more comprehensive detection
|
||||
timeout = 30
|
||||
command_lower = command.lower()
|
||||
|
||||
# Package managers and project setup tools
|
||||
if any(pattern in command_lower for pattern in [
|
||||
'npm', 'npx', 'yarn', 'pnpm',
|
||||
'pip', 'pip install', 'poetry', 'conda',
|
||||
'cargo', 'cargo build', 'cargo install',
|
||||
'go get', 'go mod',
|
||||
'composer', 'bundle',
|
||||
' brew ', 'apt-get', 'yum', 'pacman',
|
||||
'choco', 'scoop',
|
||||
'gem ', 'npm install', 'yarn add', 'pnpm add',
|
||||
'create-react-app', 'vue create', 'ng new', 'vite', 'next',
|
||||
'django-admin', 'rails new', 'flutter create',
|
||||
'dotnet new', 'mvn', 'gradle',
|
||||
'make ', 'cmake', 'meson',
|
||||
'python setup.py', 'setup.py install',
|
||||
'pip install -r', 'requirements.txt',
|
||||
'package.json', 'Gemfile', 'Cargo.toml', 'go.mod'
|
||||
]):
|
||||
timeout = 300 # 5 minutes for package managers and project creation
|
||||
logger.debug(f" ⏱️ Using extended timeout: {timeout}s (package manager/project creation detected)")
|
||||
elif any(pattern in command_lower for pattern in [
|
||||
'git clone', 'git pull', 'git fetch',
|
||||
'wget ', 'curl ',
|
||||
'tar ', 'zip ', 'unzip ',
|
||||
'docker ', 'podman',
|
||||
'kubectl', 'helm',
|
||||
'terraform', 'ansible',
|
||||
'rsync', 'scp'
|
||||
]):
|
||||
timeout = 120 # 2 minutes for network/file operations
|
||||
logger.debug(f" ⏱️ Using extended timeout: {timeout}s (network/file operation detected)")
|
||||
else:
|
||||
logger.debug(f" ⏱️ Using default timeout: {timeout}s")
|
||||
|
||||
logger.debug(f" 🔍 Command type: {command_lower.split()[0] if command.split() else 'unknown'}")
|
||||
|
||||
try:
|
||||
result = subprocess.run(
|
||||
command,
|
||||
shell=True,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout,
|
||||
cwd=cwd,
|
||||
stdin=subprocess.DEVNULL # Prevent interactive prompts from hanging
|
||||
)
|
||||
|
||||
output = result.stdout if result.returncode == 0 else f"Exit code {result.returncode}: {result.stderr}"
|
||||
|
||||
# Show summary with detailed logging
|
||||
if result.returncode == 0:
|
||||
logger.debug(f" ✓ Exit code 0 ({len(output)} chars output, {len(result.stderr)} chars stderr)")
|
||||
# Show last 300 chars of output if it exists
|
||||
if output:
|
||||
last_part = output[-300:]
|
||||
logger.debug(f" 📄 Output tail: ...{last_part}")
|
||||
if result.stderr:
|
||||
stderr_last = result.stderr[-200:]
|
||||
logger.debug(f" ⚠️ stderr (may be normal): ...{stderr_last}")
|
||||
else:
|
||||
logger.debug(f" ✗ Exit code {result.returncode}")
|
||||
if result.stderr:
|
||||
logger.debug(f" ⚠️ stderr: {result.stderr[:500]}")
|
||||
if result.stdout:
|
||||
logger.debug(f" 📄 stdout: {result.stdout[:500]}")
|
||||
|
||||
return output[:3000] # Limit output
|
||||
|
||||
except subprocess.TimeoutExpired as e:
|
||||
# Try to capture partial output on timeout
|
||||
partial_output = ""
|
||||
if e.stdout:
|
||||
partial_output = e.stdout.decode('utf-8', errors='replace')
|
||||
|
||||
error_msg = f"Command timed out after {timeout}s"
|
||||
if partial_output:
|
||||
# Show the last 500 chars of what we got before timeout
|
||||
last_output = partial_output[-500:]
|
||||
error_msg += f"\n\nPartial output (last 500 chars):\n...{last_output}"
|
||||
else:
|
||||
error_msg += "\n\n(No output captured before timeout)"
|
||||
|
||||
logger.debug(f" ⏰ TIMEOUT after {timeout}s")
|
||||
logger.debug(f" 🔍 Command that timed out: {command[:200]}")
|
||||
if partial_output:
|
||||
logger.debug(f" 📄 Partial output (first 500 chars): {partial_output[:500]}")
|
||||
logger.debug(f" 📄 Partial output (last 500 chars): ...{partial_output[-500:]}")
|
||||
|
||||
return f"Error executing bash: {error_msg}"
|
||||
|
||||
async def close(self):
|
||||
"""Close HTTP session."""
|
||||
|
||||
@@ -0,0 +1,54 @@
|
||||
"""Logging configuration for Local Swarm.
|
||||
|
||||
Provides centralized logging setup with configurable levels.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import sys
|
||||
|
||||
|
||||
def setup_logging(level=logging.DEBUG):
|
||||
"""Set up logging configuration.
|
||||
|
||||
Args:
|
||||
level: Logging level (default: DEBUG for development)
|
||||
"""
|
||||
# Create formatter
|
||||
formatter = logging.Formatter(
|
||||
'%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
||||
datefmt='%Y-%m-%d %H:%M:%S'
|
||||
)
|
||||
|
||||
# Create console handler
|
||||
console_handler = logging.StreamHandler(sys.stdout)
|
||||
console_handler.setLevel(level)
|
||||
console_handler.setFormatter(formatter)
|
||||
|
||||
# Get root logger
|
||||
root_logger = logging.getLogger()
|
||||
root_logger.setLevel(level)
|
||||
|
||||
# Remove existing handlers to avoid duplicates
|
||||
root_logger.handlers.clear()
|
||||
|
||||
# Add console handler
|
||||
root_logger.addHandler(console_handler)
|
||||
|
||||
# Set specific module loggers
|
||||
logging.getLogger('swarm').setLevel(level)
|
||||
logging.getLogger('api').setLevel(level)
|
||||
logging.getLogger('tools').setLevel(level)
|
||||
|
||||
return root_logger
|
||||
|
||||
|
||||
def get_logger(name):
|
||||
"""Get a logger with the specified name.
|
||||
|
||||
Args:
|
||||
name: Logger name (usually __name__)
|
||||
|
||||
Returns:
|
||||
logging.Logger: Configured logger
|
||||
"""
|
||||
return logging.getLogger(name)
|
||||
@@ -0,0 +1,199 @@
|
||||
"""Unit tests for tool parsing functionality."""
|
||||
|
||||
import sys
|
||||
import os
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'src'))
|
||||
|
||||
from api.routes import parse_tool_calls
|
||||
|
||||
|
||||
def test_parse_simple_tool():
|
||||
"""Test parsing a single tool call."""
|
||||
text = 'TOOL: read\nARGUMENTS: {"filePath": "test.txt"}'
|
||||
content, tools = parse_tool_calls(text)
|
||||
assert tools is not None
|
||||
assert len(tools) == 1
|
||||
assert tools[0]["function"]["name"] == "read"
|
||||
assert tools[0]["function"]["arguments"] == '{"filePath": "test.txt"}'
|
||||
|
||||
|
||||
def test_parse_no_tool():
|
||||
"""Test parsing text without tool calls."""
|
||||
text = "Just a regular response"
|
||||
content, tools = parse_tool_calls(text)
|
||||
assert tools is None
|
||||
assert content == text
|
||||
|
||||
|
||||
def test_parse_multiple_tools():
|
||||
"""Test parsing multiple tool calls."""
|
||||
text = '''TOOL: read
|
||||
ARGUMENTS: {"filePath": "file1.txt"}
|
||||
|
||||
TOOL: write
|
||||
ARGUMENTS: {"filePath": "file2.txt", "content": "hello"}'''
|
||||
content, tools = parse_tool_calls(text)
|
||||
assert tools is not None
|
||||
assert len(tools) == 2
|
||||
assert tools[0]["function"]["name"] == "read"
|
||||
assert tools[1]["function"]["name"] == "write"
|
||||
|
||||
|
||||
def test_parse_tool_with_content_before():
|
||||
"""Test parsing when there's content before the tool call."""
|
||||
text = '''I'll read that file for you.
|
||||
|
||||
TOOL: read
|
||||
ARGUMENTS: {"filePath": "config.yaml"}'''
|
||||
content, tools = parse_tool_calls(text)
|
||||
assert tools is not None
|
||||
assert len(tools) == 1
|
||||
assert tools[0]["function"]["name"] == "read"
|
||||
assert "I'll read that file for you." in content
|
||||
|
||||
|
||||
def test_parse_bash_tool():
|
||||
"""Test parsing bash tool call."""
|
||||
text = 'TOOL: bash\nARGUMENTS: {"command": "ls -la"}'
|
||||
content, tools = parse_tool_calls(text)
|
||||
assert tools is not None
|
||||
assert len(tools) == 1
|
||||
assert tools[0]["function"]["name"] == "bash"
|
||||
|
||||
|
||||
def test_parse_case_insensitive():
|
||||
"""Test that TOOL:/ARGUMENTS: is case insensitive."""
|
||||
text = 'tool: read\narguments: {"filePath": "test.txt"}'
|
||||
content, tools = parse_tool_calls(text)
|
||||
assert tools is not None
|
||||
assert len(tools) == 1
|
||||
assert tools[0]["function"]["name"] == "read"
|
||||
|
||||
|
||||
def test_parse_invalid_json():
|
||||
"""Test that invalid JSON is skipped gracefully."""
|
||||
text = '''TOOL: read
|
||||
ARGUMENTS: {invalid json}
|
||||
|
||||
TOOL: write
|
||||
ARGUMENTS: {"filePath": "test.txt"}'''
|
||||
content, tools = parse_tool_calls(text)
|
||||
# Should skip the invalid one and parse the valid one
|
||||
assert tools is not None
|
||||
assert len(tools) == 1
|
||||
assert tools[0]["function"]["name"] == "write"
|
||||
|
||||
|
||||
def test_parse_empty_text():
|
||||
"""Test parsing empty text."""
|
||||
text = ""
|
||||
content, tools = parse_tool_calls(text)
|
||||
assert tools is None
|
||||
assert content == ""
|
||||
|
||||
|
||||
def test_parse_whitespace_only():
|
||||
"""Test parsing whitespace-only text."""
|
||||
text = " \n\t "
|
||||
content, tools = parse_tool_calls(text)
|
||||
assert tools is None
|
||||
|
||||
|
||||
def test_parse_markdown_code_block():
|
||||
"""Test parsing markdown code blocks as fallback (e.g., ```bash command```)."""
|
||||
text = '''I'll help you create a project.
|
||||
|
||||
```bash
|
||||
mkdir myapp
|
||||
cd myapp
|
||||
```
|
||||
|
||||
Now let's create a file.'''
|
||||
content, tools = parse_tool_calls(text)
|
||||
assert tools is not None
|
||||
assert len(tools) == 1
|
||||
assert tools[0]["function"]["name"] == "bash"
|
||||
assert "mkdir myapp" in tools[0]["function"]["arguments"]
|
||||
assert "cd myapp" in tools[0]["function"]["arguments"]
|
||||
|
||||
|
||||
def test_parse_markdown_inline():
|
||||
"""Test parsing inline bash commands in markdown."""
|
||||
text = '''Here's what to do:
|
||||
|
||||
```bash
|
||||
ls -la
|
||||
```'''
|
||||
content, tools = parse_tool_calls(text)
|
||||
assert tools is not None
|
||||
assert len(tools) == 1
|
||||
assert tools[0]["function"]["name"] == "bash"
|
||||
assert "ls -la" in tools[0]["function"]["arguments"]
|
||||
|
||||
|
||||
def test_tool_instructions_content():
|
||||
"""Test that tool instructions contain required sections (REVIEW-2026-02-24 Blocker #4)."""
|
||||
from api.routes import _load_tool_instructions
|
||||
|
||||
# Load instructions from config file
|
||||
instructions = _load_tool_instructions()
|
||||
|
||||
# Verify key instruction components are present (minimal instructions)
|
||||
assert "use tools" in instructions.lower(), "Instructions must mention tool usage"
|
||||
assert "Format" in instructions or "format" in instructions.lower(), "Instructions must mention format"
|
||||
assert "no explanations" in instructions.lower(), "Instructions must forbid explanations"
|
||||
assert "no markdown" in instructions.lower(), "Instructions must forbid markdown"
|
||||
|
||||
|
||||
def test_tool_instructions_token_count():
|
||||
"""Test that tool instructions are within token budget (REVIEW-2026-02-24 Blocker #1)."""
|
||||
from api.routes import _load_tool_instructions
|
||||
|
||||
# Load instructions from config file
|
||||
instructions = _load_tool_instructions()
|
||||
|
||||
# Token budget: 2000 hard limit
|
||||
# Rough estimate: 4 chars = 1 token
|
||||
char_count = len(instructions)
|
||||
estimated_tokens = char_count // 4
|
||||
|
||||
assert estimated_tokens <= 2000, f"Instructions estimated at {estimated_tokens} tokens, must be under 2000"
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Run all tests
|
||||
test_functions = [
|
||||
test_parse_simple_tool,
|
||||
test_parse_no_tool,
|
||||
test_parse_multiple_tools,
|
||||
test_parse_tool_with_content_before,
|
||||
test_parse_bash_tool,
|
||||
test_parse_case_insensitive,
|
||||
test_parse_invalid_json,
|
||||
test_parse_empty_text,
|
||||
test_parse_whitespace_only,
|
||||
test_parse_markdown_code_block,
|
||||
test_parse_markdown_inline,
|
||||
test_tool_instructions_content,
|
||||
test_tool_instructions_token_count,
|
||||
]
|
||||
|
||||
passed = 0
|
||||
failed = 0
|
||||
|
||||
for test_func in test_functions:
|
||||
try:
|
||||
test_func()
|
||||
print(f"✓ {test_func.__name__}")
|
||||
passed += 1
|
||||
except AssertionError as e:
|
||||
print(f"✗ {test_func.__name__}: {e}")
|
||||
failed += 1
|
||||
except Exception as e:
|
||||
print(f"✗ {test_func.__name__}: Exception - {e}")
|
||||
failed += 1
|
||||
|
||||
print(f"\n{passed} passed, {failed} failed")
|
||||
|
||||
if failed > 0:
|
||||
sys.exit(1)
|
||||
Reference in New Issue
Block a user