local_swarm/AGENT_WORKER.md

# Agent Worker Rules

> **⚠️ IMPORTANT:** This document is for IMPLEMENTATION AGENTS (coding, testing, documentation).
> **DO NOT MAKE COMMITS** - that's the AGENT_REVIEW.md agent's job.

## Pre-Flight Checklist (MUST complete before coding)

### ⚠️ GIT OPERATIONS REMINDER
**DO NOT make commits.** Commits are ONLY handled by AGENT_REVIEW.md agents.
You CAN create branches and stage files (git add), but DO NOT commit (git commit).

### 1. Token Budget Verification
- [ ] System prompt + instructions ≤ 2000 tokens (hard limit)
- [ ] Leave ≥ 50% of context window for user input
- [ ] If adding documentation/examples, remove old ones to maintain budget
- [ ] Use `tiktoken` or estimate: ~4 chars = 1 token

### 2. Test Plan Required
Before writing ANY code, write a test plan:
```markdown
## Test Plan for [Feature]

### Unit Tests
- [ ] Test case 1: [specific input] → [expected output]
- [ ] Test case 2: [edge case]
- [ ] Test case 3: [error condition]

### Integration Tests
- [ ] End-to-end flow: [steps]
- [ ] Expected result: [what success looks like]

### Manual Verification
- [ ] Command to run: [exact command]
- [ ] Expected output: [what to see]
```

### 3. Design Decision Document
For any change > 50 lines:
```markdown
## Design Decision

### Problem
[What are we solving?]

### Options Considered
1. [Option A] - Pros: ..., Cons: ...
2. [Option B] - Pros: ..., Cons: ...

### Decision
[Which option and WHY]

### Impact
- Token count change: [+/- X tokens]
- Breaking changes: [Yes/No]
- Migration needed: [Yes/No]
```

## Coding Rules

### Rule 1: One Feature = One Commit
**NOTE:** Regular agents DO NOT make commits. AGENT_REVIEW.md agents handle commits.

When AGENT_REVIEW.md agents make commits:
- Never combine unrelated changes in one commit
- If you fix a bug AND refactor, make 2 commits
- Commit message format: `type(scope): description`
  - Types: `feat`, `fix`, `refactor`, `test`, `docs`, `chore`
  - Example: `feat(tools): add working directory support`

### Rule 2: Tests First (TDD)
```python
# BAD: Write code, maybe test later
def parse_tools(text):
    # ... implementation ...
    pass

# GOOD: Write test first
def test_parse_simple_tool():
    text = 'TOOL: read\nARGUMENTS: {"filePath": "test.txt"}'
    content, tools = parse_tool_calls(text)
    assert len(tools) == 1
    assert tools[0]["function"]["name"] == "read"

# Then write minimal code to pass
```

### Rule 3: Minimal, Maintainable, Modular Code
**Core Focus:** Keep code minimal, maintainable, and modular.

#### Minimal
- Write only the code needed to solve the problem
- Avoid unnecessary abstractions or over-engineering
- Keep functions small and focused (max 50 lines)
- Prefer simple solutions over complex ones
- Remove dead code and unused imports immediately

#### Maintainable
- Clear, descriptive variable and function names
- One concept per file/module
- Self-documenting code with minimal comments
- Consistent code style throughout
- Easy to understand for future maintainers

#### Modular
- Single Responsibility Principle: One purpose per module/function
- Loose coupling between components
- Clear, stable interfaces between modules
- Easy to test in isolation
- Reusable components where appropriate

```python
# BAD: Monolithic, complex, hard to maintain
def process_user_request(request_data, validate=True, save=True, notify=True, format_output=False):
    # 200+ lines doing everything
    validation_result = validate_request(request_data)
    if validation_result.is_valid:
        if save:
            db_connection = get_db_connection()
            cursor = db_connection.cursor()
            cursor.execute("INSERT INTO requests ...", request_data)
            db_connection.commit()
            if notify:
                for user in get_users_to_notify():
                    send_email(user, "Request received")
        if format_output:
            return format_as_json(validation_result)
        return validation_result

# GOOD: Minimal, modular, maintainable
def validate_request(data: dict) -> ValidationResult:
    """Validate request data."""
    return ValidationResult(is_valid=len(data) > 0)

def save_request(data: dict) -> str:
    """Save request to database."""
    return db.insert("requests", data)

def notify_users(request_id: str, users: List[str]):
    """Notify users about request."""
    for user in users:
        send_email(user, f"Request {request_id} received")
```

### Rule 4: No Production Debugging
- NEVER add `print()` statements for debugging
- Use `logging` module with appropriate levels
- Remove ALL debug logging before committing
- Exception: Structured logging for observability (metrics, errors)

```python
# BAD
def process_request(request):
    print(f"DEBUG: Got request {request}")  # REMOVE THIS
    result = handle(request)
    print(f"DEBUG: Result {result}")  # REMOVE THIS
    return result

# GOOD
def process_request(request):
    logger.debug("Processing request", extra={"request_id": request.id})
    result = handle(request)
    return result
```

### Rule 4: Architecture Consistency
- Check ARCHITECTURE.md before changing patterns
- If unsure, ask in PR description
- NEVER change architecture in a "fix" commit
- Architecture changes require design doc + team review

### Rule 5: Parse Once, Parse Well
- ONE parser per format
- If adding new format, remove old one
- Parser must handle all documented cases
- Parser must fail gracefully (return empty, not crash)

```python
# BAD: Multiple parsers for same thing
def parse_tools_v1(text): ...
def parse_tools_v2(text): ...
def parse_tools_legacy(text): ...

# GOOD: Single parser with clear regex
TOOL_PATTERN = r'TOOL:\s*(\w+)\s*\nARGUMENTS:\s*(\{[^}]*\})'

def parse_tool_calls(text: str) -> Tuple[str, List[dict]]:
    matches = list(re.finditer(TOOL_PATTERN, text, re.IGNORECASE))
    if not matches:
        return text, []
    # ... rest of parsing ...
```

### Rule 6: Token-Aware Documentation
- Every docstring/example has a token cost
- Count tokens before adding
- If over budget, remove something else
- Prioritize: Code clarity > Examples > Explanations

```python
# BAD: 150 tokens of fluff
def calculate(x, y):
    """
    This function calculates the sum of two numbers.

    The sum is calculated by using the built-in Python
    addition operator which adds the values together.

    Args:
        x (int): The first number to add
        y (int): The second number to add

    Returns:
        int: The sum of x and y

    Example:
        >>> calculate(1, 2)
        3
    """
    return x + y

# GOOD: 20 tokens, clear enough
def calculate(x: int, y: int) -> int:
    """Return sum of x and y."""
    return x + y
```

### Rule 7: Clear Error Messages
- Every error must tell user EXACTLY what went wrong
- Include context: what was expected vs what was received
- Suggest fix if possible

```python
# BAD
raise ValueError("Invalid input")

# GOOD
raise ValueError(f"Invalid model format: '{model_str}'. Expected: 'name:size:quant' (e.g., 'qwen:7b:q4')")
```

### Rule 8: No Circular Imports
```python
# BAD: src/a.py imports src/b.py, src/b.py imports src/a.py

# GOOD: Use dependency injection or move shared code to common module
```

## Git Workflow Rules

### CRITICAL: Commit Handling

**REGULAR AGENTS: DO NOT MAKE COMMITS**
- Regular agents do NOT create commits, pull requests, or manage git history
- Commits are ONLY handled by agents following AGENT_REVIEW.md guidelines
- If you need to commit code, the AGENT_REVIEW.md agent should handle it
- Exception: You may manually stage files (git add) for the review agent
- **You CAN create and checkout branches** (that's fine) - just don't commit to them

### Branch Strategy

**Main Branches (Protected):**
- `main` - Production-ready code only
- `develop` - Integration branch for features (optional for small projects)

**Working Branches (Temporary - AGENT_REVIEW.md ONLY):**
```
feature/description           # New features
fix/description               # Bug fixes
refactor/description          # Code refactoring
hotfix/description            # Critical production fixes
docs/description              # Documentation only
experiment/description        # Experimental work (may be deleted)
```

**Note:** Regular agents should NOT create branches or handle git operations

### Workflow Steps

#### 1. Starting New Work
```bash
# ALWAYS start from main
git checkout main
git pull origin main

# Create feature branch
git checkout -b feature/description

# Push branch to remote immediately
git push -u origin feature/description
```

#### 2. During Development
```bash
# Commit often (small, logical commits)
git add -p  # Stage interactively (review each change)
git commit -m "feat(scope): description"

# Push regularly (backup)
git push origin feature/description

# Keep up-to-date with main
git fetch origin
git rebase origin/main  # Resolve conflicts immediately
```

#### 3. Before PR (Final Cleanup)
```bash
# Interactive rebase to clean history
git rebase -i main

# Squash these:
# - "fix typo"
# - "WIP"
# - "asdf"
# - "omg finally"
# - Multiple attempts at same fix

# Keep separate:
# - Logical feature steps
# - Refactoring separate from features
# - Test additions separate from code changes
```

#### 4. Creating PR
- Push final branch: `git push origin feature/description`
- Create PR to `main` (not develop unless project uses git-flow)
- Fill PR template completely
- Request review from AGENT_REVIEW.md qualified reviewer
- Link related issues: `Closes #123`, `Fixes #456`

### Commit Rules

**Commit Frequency:**
- Commit after each logical step (not just at end of day)
- Each commit should leave codebase in working state
- "Work in progress" commits OK on feature branches (clean before PR)

**Commit Size:**
- Max 200 lines changed per commit
- Max 5 files changed per commit (unless related)
- Each commit reviewable in 5 minutes
- Split large changes:
  ```bash
  # BAD: One giant commit
  git commit -am "Add federation + fix bugs + refactor + docs"

  # GOOD: Separate commits
  git commit -m "refactor(network): extract peer discovery logic"
  git commit -m "feat(federation): implement cross-swarm voting"
  git commit -m "fix(federation): handle peer timeout edge case"
  git commit -m "docs: update federation architecture docs"
  ```

**Commit Message Format:**
```
type(scope): subject (50 chars or less)

Body (wrap at 72 chars):
- Why this change was made
- What problem it solves
- Any breaking changes or migration notes

Refs: #123, #456
```

**Types:**
- `feat`: New feature
- `fix`: Bug fix
- `refactor`: Code restructuring (no behavior change)
- `test`: Adding/updating tests
- `docs`: Documentation only
- `chore`: Build, dependencies, tooling
- `perf`: Performance improvement
- `style`: Formatting (no code change)

**Subject Rules:**
- Use imperative mood: "Add feature" not "Added feature"
- No period at end
- Lowercase after type
- Max 50 characters

### Branch Hygiene

**DO:**
- Create branch from latest main
- Use descriptive branch names
- Push branch to remote immediately
- Rebase onto main regularly
- Delete merged branches
- Squash fixup commits before PR

**DON'T:**
- Commit directly to main
- Have long-lived branches (>1 week without rebase)
- Include unrelated changes in one branch
- Commit broken code (even temporarily)
- Force push to shared branches
- Merge without review

### Handling Conflicts

```bash
# While rebasing
git rebase main
# Conflicts happen...

# Resolve conflicts in files
git add <resolved-files>
git rebase --continue

# If messed up, abort
git rebase --abort
```

**Conflict Resolution Rules:**
1. Understand both changes before resolving
2. Don't just pick "ours" or "theirs"
3. Test after resolving
4. Commit message should explain resolution

### Emergency Procedures

**Committed to wrong branch:**
```bash
# Undo last commit (keep changes)
git reset HEAD~1

# Stash changes
git stash

# Switch to correct branch
git checkout correct-branch

# Apply changes
git stash pop

# Commit properly
git commit -m "..."
```

**Need to undo pushed commit:**
```bash
# Revert (creates new commit, safe for shared history)
git revert <commit-hash>
git push origin branch-name

# OR if feature branch not shared yet
# Reset and force push (DANGEROUS)
git reset --hard HEAD~1
git push --force-with-lease origin branch-name
```

### Release Process

**NOTE:** Release process should be handled by AGENT_REVIEW.md agents.

```bash
# Create release branch
git checkout -b release/v1.2.0

# Bump version, update changelog
git commit -m "chore: bump version to 1.2.0"

# Tag release
git tag -a v1.2.0 -m "Release version 1.2.0"
git push origin v1.2.0

# Merge to main
git checkout main
git merge --no-ff release/v1.2.0
git push origin main

# Delete release branch
git branch -d release/v1.2.0
```

### What Regular Agents Should NOT Do

**REGULAR AGENTS DO NOT:**
- Make commits (git commit)
- Create pull requests
- Push to remote repositories
- Merge branches
- Manage git history (rebase, reset, etc.)
- Delete branches

**REGULAR AGENTS CAN:**
- Create and checkout branches (git checkout -b)
- Stage files for review (git add)
- Switch between branches

**REGULAR AGENTS SHOULD:**
- Write code and tests
- Run tests locally
- Use logging instead of print()
- Follow code quality standards
- Document changes in code comments or design docs
- Hand off completed work to AGENT_REVIEW.md agent for commit/PR creation

**Example Workflow:**
```
1. Agent reads task from user
2. Agent creates feature branch (git checkout -b feature/name)
3. Agent implements feature (writes code, tests, docs)
4. Agent stages changes for review (git add)
5. Agent reports completion with summary of changes
6. AGENT_REVIEW.md agent:
   - Reviews code quality
   - Makes commits
   - Creates PR
```

### Pre-Commit Checklist
- [ ] Code passes `pytest` (if tests exist)
- [ ] No `print()` statements (use logging)
- [ ] No bare `except:` clauses
- [ ] All functions have type hints
- [ ] All public functions have docstrings
- [ ] No TODO comments (create issues instead)
- [ ] Token count checked (if modifying prompts)

## Testing Requirements

### Unit Test Coverage
Minimum 80% coverage for:
- Parsing functions
- Business logic
- State machines

### Integration Tests Required For:
- API endpoints
- Tool execution
- File operations
- Network calls (mocked)

### Test File Structure
```
tests/
├── unit/
│   ├── test_parser.py
│   ├── test_executor.py
│   └── test_consensus.py
├── integration/
│   ├── test_api.py
│   └── test_tools.py
└── fixtures/
    └── sample_responses.json
```

## Code Quality Standards

### Python Style
- Follow PEP 8
- Use type hints for all function signatures
- Max line length: 100 characters
- Max function length: 50 lines
- Max file length: 300 lines (split if larger)

### Imports (Order Matters)
```python
# 1. Standard library
import os
import sys
from typing import List

# 2. Third party
import numpy as np
from fastapi import APIRouter

# 3. Local (absolute imports only)
from src.tools.executor import ToolExecutor
from src.swarm.manager import SwarmManager
```

### Documentation Standards
Every module must have:
```python
"""Module purpose in one line.

Longer description if needed (2-3 sentences max).
"""
```

Every public function must have:
```python
def process_data(data: dict, options: Optional[dict] = None) -> Result:
    """Process data with given options.

    Args:
        data: Input data to process
        options: Processing options (default: None)

    Returns:
        Processed result

    Raises:
        ValueError: If data is invalid
    """
```

## Architecture Rules

### No Feature Flags in Core Logic
```python
# BAD
if config.get("USE_NEW_PARSER", False):
    result = new_parser(text)
else:
    result = old_parser(text)

# GOOD: Pick one, remove the other
def parse_tool_calls(text: str) -> Tuple[str, List[dict]]:
    """Parse tool calls from text."""
    # Single implementation
```

### No Code Duplication
- If you copy-paste > 3 lines, extract to function
- Shared code goes in `src/common/` or `src/utils/`

### Separation of Concerns
```
src/
├── parser/       # Only parsing logic
├── executor/     # Only execution logic
├── formatter/    # Only formatting/output
└── integration/  # Only API glue code
```

## Forbidden Patterns

### Never Do These:
1. **Bare except clauses** - Always catch specific exceptions
2. **Production debugging** - No `print()`, use logging
3. **Multiple return formats** - One function = one return type
4. **Silent failures** - Always log/report errors
5. **Magic numbers** - Use named constants
6. **Global state** - Use dependency injection
7. **Deep nesting** - Max 3 levels of indentation
8. **Circular dependencies** - Re-architect if needed

## Review Preparation

Before marking PR ready:

1. **Self-Review Checklist** (check each item):
   - [ ] Tests pass: `pytest -v`
   - [ ] Type checking: `mypy src/`
   - [ ] Linting: `ruff check src/`
   - [ ] Formatting: `black src/`
   - [ ] Token count verified (if applicable)
   - [ ] No debug code left in
   - [ ] Commit messages follow format
   - [ ] Documentation updated

2. **PR Description Template**:
   ```markdown
   ## Changes
   - [Brief description]

   ## Testing
   - [How you tested it]

   ## Token Impact (if applicable)
   - Before: X tokens
   - After: Y tokens
   - Change: +/- Z tokens

   ## Checklist
   - [ ] Tests added/updated
   - [ ] Documentation updated
   - [ ] Self-review completed
   ```

3. **Run Final Verification**:
   ```bash
   # Run all checks
   pytest && mypy src/ && ruff check src/ && black --check src/
   ```

## Continuous Learning & Research

You MUST periodically research best practices and alternative implementations. This prevents stagnation and ensures we're using proven approaches.

### When to Research

**Before Major Features:**
- Spend 15-30 minutes researching similar implementations
- Check: GitHub, Stack Overflow, official docs, research papers
- Document findings in PR description

**Monthly Reviews:**
- Review project's core technologies for updates
- Check if better libraries/algorithms exist
- Look for deprecated patterns we're using

**When Stuck:**
- Don't brute force a solution
- Research how others solved similar problems
- Consider if problem indicates architectural issue

### What to Research

**1. Best Practices**
```bash
# Search queries to use:
"python async best practices 2024"
"fastapi error handling patterns"
"LLM consensus voting algorithms"
"gguf quantization comparison"
```

**2. Similar Implementations**
- Search GitHub for similar projects
- Read their architecture decisions
- Check their issues for pitfalls they hit
- Note: Don't copy code blindly, understand WHY

**3. Research Papers & Benchmarks**
- For consensus algorithms
- For quantization strategies
- For context window optimization
- For distributed systems patterns

**4. Library Updates**
- Check CHANGELOG of major dependencies
- Review migration guides
- Test new features in separate branch

### Documentation of Research

Create `research/YYYY-MM-DD-topic.md` for significant findings:

```markdown
# Research: [Topic]

**Date:** YYYY-MM-DD
**Researcher:** [Name]
**Trigger:** [Why researched this]

## Findings

### Option 1: [Name]
- Source: [Link]
- Pros: ...
- Cons: ...
- Complexity: Low/Medium/High

### Option 2: [Name]
- Source: [Link]
- Pros: ...
- Cons: ...
- Complexity: Low/Medium/High

## Recommendation
[Which option and WHY]

## Implementation Notes
[Specific code changes needed]

## Risks
[What could go wrong]
```

### Research Checklist

**Before implementing:**
- [ ] Searched for similar open-source implementations
- [ ] Checked recent best practices (2023+)
- [ ] Looked for benchmarking data if applicable
- [ ] Reviewed alternative approaches
- [ ] Considered long-term maintenance implications

**After implementing:**
- [ ] Documented why chosen approach was selected
- [ ] Added comments linking to research sources
- [ ] Created test comparing against alternatives (if applicable)

### Example Research Topics

**Immediate:**
- "Python type hints best practices 2024"
- "FastAPI dependency injection patterns"
- "LLM tool use format comparison"

**Short-term:**
- "Consensus algorithms for distributed LLM systems"
- "Context window compression techniques"
- "GGUF quantization vs other formats"

**Long-term:**
- "Speculative decoding implementation"
- "PagedAttention for multiple workers"
- "RAG integration patterns"

### Research Sources

**Reliable:**
- Official documentation (Python, FastAPI, etc.)
- Well-maintained GitHub repos (>1k stars, active)
- Recent conference talks (PyCon, NeurIPS, etc.)
- Research papers with code (Papers With Code)
- Official blogs (Python.org, FastAPI.tiangolo.com)

**Use with Caution:**
- Medium articles (variable quality)
- Old Stack Overflow answers (>2 years)
- Tutorial sites (often outdated)
- YouTube videos (hard to verify)

### Integration with Development

**Weekly:**
- Spend 30 minutes reading about one technology we use
- Note any improvements we could make
- Create issues for promising findings

**Monthly:**
- Review all open research issues
- Prioritize based on impact vs effort
- Schedule implementation of high-value items

**Quarterly:**
- Architecture review: Are our patterns still best?
- Dependency audit: Updates needed?
- Performance review: Could we be faster?

---

**Remember:**
- Research prevents reinvention of the wheel
- But don't research forever - timebox it (30 min max for most decisions)
- Document findings so others don't repeat the research
- Apply critical thinking - "best practice" depends on context

---

## Breaking This Ruleset

If you MUST break a rule:
1. Document WHY in code comments
2. Get explicit approval in PR
3. Create follow-up issue to fix properly
4. Never break Rule 3 (No Production Debugging)

---

**Remember: Quality over speed. A fix that takes 2 days with tests is better than a fix that takes 2 hours and breaks 3 other things.**