Files
local_swarm/AGENT_WORKER.md
T
sleepy d22c52ec04 docs: Add minimal, maintainable, modular code requirements
- AGENT_WORKER.md: Added Rule 3 for minimal, maintainable, modular code
- AGENT_REVIEW.md: Added strict enforcement check in Phase 2
- Emphasizes single responsibility, clean interfaces, and production quality
- Reviewers must block code that doesn't meet these standards
2026-02-25 12:30:18 +01:00

22 KiB

Agent Worker Rules

⚠️ IMPORTANT: This document is for IMPLEMENTATION AGENTS (coding, testing, documentation). DO NOT MAKE COMMITS - that's the AGENT_REVIEW.md agent's job.

Pre-Flight Checklist (MUST complete before coding)

⚠️ GIT OPERATIONS REMINDER

DO NOT make commits. Commits are ONLY handled by AGENT_REVIEW.md agents. You CAN create branches and stage files (git add), but DO NOT commit (git commit).

1. Token Budget Verification

  • System prompt + instructions ≤ 2000 tokens (hard limit)
  • Leave ≥ 50% of context window for user input
  • If adding documentation/examples, remove old ones to maintain budget
  • Use tiktoken or estimate: ~4 chars = 1 token

2. Test Plan Required

Before writing ANY code, write a test plan:

## Test Plan for [Feature]

### Unit Tests
- [ ] Test case 1: [specific input] → [expected output]
- [ ] Test case 2: [edge case]
- [ ] Test case 3: [error condition]

### Integration Tests  
- [ ] End-to-end flow: [steps]
- [ ] Expected result: [what success looks like]

### Manual Verification
- [ ] Command to run: [exact command]
- [ ] Expected output: [what to see]

3. Design Decision Document

For any change > 50 lines:

## Design Decision

### Problem
[What are we solving?]

### Options Considered
1. [Option A] - Pros: ..., Cons: ...
2. [Option B] - Pros: ..., Cons: ...

### Decision
[Which option and WHY]

### Impact
- Token count change: [+/- X tokens]
- Breaking changes: [Yes/No]
- Migration needed: [Yes/No]

Coding Rules

Rule 1: One Feature = One Commit

NOTE: Regular agents DO NOT make commits. AGENT_REVIEW.md agents handle commits.

When AGENT_REVIEW.md agents make commits:

  • Never combine unrelated changes in one commit
  • If you fix a bug AND refactor, make 2 commits
  • Commit message format: type(scope): description
    • Types: feat, fix, refactor, test, docs, chore
    • Example: feat(tools): add working directory support

Rule 2: Tests First (TDD)

# BAD: Write code, maybe test later
def parse_tools(text):
    # ... implementation ...
    pass

# GOOD: Write test first
def test_parse_simple_tool():
    text = 'TOOL: read\nARGUMENTS: {"filePath": "test.txt"}'
    content, tools = parse_tool_calls(text)
    assert len(tools) == 1
    assert tools[0]["function"]["name"] == "read"

# Then write minimal code to pass

Rule 3: Minimal, Maintainable, Modular Code

Core Focus: Keep code minimal, maintainable, and modular.

Minimal

  • Write only the code needed to solve the problem
  • Avoid unnecessary abstractions or over-engineering
  • Keep functions small and focused (max 50 lines)
  • Prefer simple solutions over complex ones
  • Remove dead code and unused imports immediately

Maintainable

  • Clear, descriptive variable and function names
  • One concept per file/module
  • Self-documenting code with minimal comments
  • Consistent code style throughout
  • Easy to understand for future maintainers

Modular

  • Single Responsibility Principle: One purpose per module/function
  • Loose coupling between components
  • Clear, stable interfaces between modules
  • Easy to test in isolation
  • Reusable components where appropriate
# BAD: Monolithic, complex, hard to maintain
def process_user_request(request_data, validate=True, save=True, notify=True, format_output=False):
    # 200+ lines doing everything
    validation_result = validate_request(request_data)
    if validation_result.is_valid:
        if save:
            db_connection = get_db_connection()
            cursor = db_connection.cursor()
            cursor.execute("INSERT INTO requests ...", request_data)
            db_connection.commit()
            if notify:
                for user in get_users_to_notify():
                    send_email(user, "Request received")
        if format_output:
            return format_as_json(validation_result)
        return validation_result

# GOOD: Minimal, modular, maintainable
def validate_request(data: dict) -> ValidationResult:
    """Validate request data."""
    return ValidationResult(is_valid=len(data) > 0)

def save_request(data: dict) -> str:
    """Save request to database."""
    return db.insert("requests", data)

def notify_users(request_id: str, users: List[str]):
    """Notify users about request."""
    for user in users:
        send_email(user, f"Request {request_id} received")

Rule 4: No Production Debugging

  • NEVER add print() statements for debugging
  • Use logging module with appropriate levels
  • Remove ALL debug logging before committing
  • Exception: Structured logging for observability (metrics, errors)
# BAD
def process_request(request):
    print(f"DEBUG: Got request {request}")  # REMOVE THIS
    result = handle(request)
    print(f"DEBUG: Result {result}")  # REMOVE THIS
    return result

# GOOD
def process_request(request):
    logger.debug("Processing request", extra={"request_id": request.id})
    result = handle(request)
    return result

Rule 4: Architecture Consistency

  • Check ARCHITECTURE.md before changing patterns
  • If unsure, ask in PR description
  • NEVER change architecture in a "fix" commit
  • Architecture changes require design doc + team review

Rule 5: Parse Once, Parse Well

  • ONE parser per format
  • If adding new format, remove old one
  • Parser must handle all documented cases
  • Parser must fail gracefully (return empty, not crash)
# BAD: Multiple parsers for same thing
def parse_tools_v1(text): ...
def parse_tools_v2(text): ...
def parse_tools_legacy(text): ...

# GOOD: Single parser with clear regex
TOOL_PATTERN = r'TOOL:\s*(\w+)\s*\nARGUMENTS:\s*(\{[^}]*\})'

def parse_tool_calls(text: str) -> Tuple[str, List[dict]]:
    matches = list(re.finditer(TOOL_PATTERN, text, re.IGNORECASE))
    if not matches:
        return text, []
    # ... rest of parsing ...

Rule 6: Token-Aware Documentation

  • Every docstring/example has a token cost
  • Count tokens before adding
  • If over budget, remove something else
  • Prioritize: Code clarity > Examples > Explanations
# BAD: 150 tokens of fluff
def calculate(x, y):
    """
    This function calculates the sum of two numbers.
    
    The sum is calculated by using the built-in Python 
    addition operator which adds the values together.
    
    Args:
        x (int): The first number to add
        y (int): The second number to add
        
    Returns:
        int: The sum of x and y
        
    Example:
        >>> calculate(1, 2)
        3
    """
    return x + y

# GOOD: 20 tokens, clear enough
def calculate(x: int, y: int) -> int:
    """Return sum of x and y."""
    return x + y

Rule 7: Clear Error Messages

  • Every error must tell user EXACTLY what went wrong
  • Include context: what was expected vs what was received
  • Suggest fix if possible
# BAD
raise ValueError("Invalid input")

# GOOD
raise ValueError(f"Invalid model format: '{model_str}'. Expected: 'name:size:quant' (e.g., 'qwen:7b:q4')")

Rule 8: No Circular Imports

# BAD: src/a.py imports src/b.py, src/b.py imports src/a.py

# GOOD: Use dependency injection or move shared code to common module

Git Workflow Rules

CRITICAL: Commit Handling

REGULAR AGENTS: DO NOT MAKE COMMITS

  • Regular agents do NOT create commits, pull requests, or manage git history
  • Commits are ONLY handled by agents following AGENT_REVIEW.md guidelines
  • If you need to commit code, the AGENT_REVIEW.md agent should handle it
  • Exception: You may manually stage files (git add) for the review agent
  • You CAN create and checkout branches (that's fine) - just don't commit to them

Branch Strategy

Main Branches (Protected):

  • main - Production-ready code only
  • develop - Integration branch for features (optional for small projects)

Working Branches (Temporary - AGENT_REVIEW.md ONLY):

feature/description           # New features
fix/description               # Bug fixes  
refactor/description          # Code refactoring
hotfix/description            # Critical production fixes
docs/description              # Documentation only
experiment/description        # Experimental work (may be deleted)

Note: Regular agents should NOT create branches or handle git operations

Workflow Steps

1. Starting New Work

# ALWAYS start from main
git checkout main
git pull origin main

# Create feature branch
git checkout -b feature/description

# Push branch to remote immediately
git push -u origin feature/description

2. During Development

# Commit often (small, logical commits)
git add -p  # Stage interactively (review each change)
git commit -m "feat(scope): description"

# Push regularly (backup)
git push origin feature/description

# Keep up-to-date with main
git fetch origin
git rebase origin/main  # Resolve conflicts immediately

3. Before PR (Final Cleanup)

# Interactive rebase to clean history
git rebase -i main

# Squash these:
# - "fix typo"
# - "WIP"
# - "asdf"
# - "omg finally"
# - Multiple attempts at same fix

# Keep separate:
# - Logical feature steps
# - Refactoring separate from features
# - Test additions separate from code changes

4. Creating PR

  • Push final branch: git push origin feature/description
  • Create PR to main (not develop unless project uses git-flow)
  • Fill PR template completely
  • Request review from AGENT_REVIEW.md qualified reviewer
  • Link related issues: Closes #123, Fixes #456

Commit Rules

Commit Frequency:

  • Commit after each logical step (not just at end of day)
  • Each commit should leave codebase in working state
  • "Work in progress" commits OK on feature branches (clean before PR)

Commit Size:

  • Max 200 lines changed per commit
  • Max 5 files changed per commit (unless related)
  • Each commit reviewable in 5 minutes
  • Split large changes:
    # BAD: One giant commit
    git commit -am "Add federation + fix bugs + refactor + docs"
    
    # GOOD: Separate commits
    git commit -m "refactor(network): extract peer discovery logic"
    git commit -m "feat(federation): implement cross-swarm voting"
    git commit -m "fix(federation): handle peer timeout edge case"
    git commit -m "docs: update federation architecture docs"
    

Commit Message Format:

type(scope): subject (50 chars or less)

Body (wrap at 72 chars):
- Why this change was made
- What problem it solves  
- Any breaking changes or migration notes

Refs: #123, #456

Types:

  • feat: New feature
  • fix: Bug fix
  • refactor: Code restructuring (no behavior change)
  • test: Adding/updating tests
  • docs: Documentation only
  • chore: Build, dependencies, tooling
  • perf: Performance improvement
  • style: Formatting (no code change)

Subject Rules:

  • Use imperative mood: "Add feature" not "Added feature"
  • No period at end
  • Lowercase after type
  • Max 50 characters

Branch Hygiene

DO:

  • Create branch from latest main
  • Use descriptive branch names
  • Push branch to remote immediately
  • Rebase onto main regularly
  • Delete merged branches
  • Squash fixup commits before PR

DON'T:

  • Commit directly to main
  • Have long-lived branches (>1 week without rebase)
  • Include unrelated changes in one branch
  • Commit broken code (even temporarily)
  • Force push to shared branches
  • Merge without review

Handling Conflicts

# While rebasing
git rebase main
# Conflicts happen...

# Resolve conflicts in files
git add <resolved-files>
git rebase --continue

# If messed up, abort
git rebase --abort

Conflict Resolution Rules:

  1. Understand both changes before resolving
  2. Don't just pick "ours" or "theirs"
  3. Test after resolving
  4. Commit message should explain resolution

Emergency Procedures

Committed to wrong branch:

# Undo last commit (keep changes)
git reset HEAD~1

# Stash changes
git stash

# Switch to correct branch
git checkout correct-branch

# Apply changes
git stash pop

# Commit properly
git commit -m "..."

Need to undo pushed commit:

# Revert (creates new commit, safe for shared history)
git revert <commit-hash>
git push origin branch-name

# OR if feature branch not shared yet
# Reset and force push (DANGEROUS)
git reset --hard HEAD~1
git push --force-with-lease origin branch-name

Release Process

NOTE: Release process should be handled by AGENT_REVIEW.md agents.

# Create release branch
git checkout -b release/v1.2.0

# Bump version, update changelog
git commit -m "chore: bump version to 1.2.0"

# Tag release
git tag -a v1.2.0 -m "Release version 1.2.0"
git push origin v1.2.0

# Merge to main
git checkout main
git merge --no-ff release/v1.2.0
git push origin main

# Delete release branch
git branch -d release/v1.2.0

What Regular Agents Should NOT Do

REGULAR AGENTS DO NOT:

  • Make commits (git commit)
  • Create pull requests
  • Push to remote repositories
  • Merge branches
  • Manage git history (rebase, reset, etc.)
  • Delete branches

REGULAR AGENTS CAN:

  • Create and checkout branches (git checkout -b)
  • Stage files for review (git add)
  • Switch between branches

REGULAR AGENTS SHOULD:

  • Write code and tests
  • Run tests locally
  • Use logging instead of print()
  • Follow code quality standards
  • Document changes in code comments or design docs
  • Hand off completed work to AGENT_REVIEW.md agent for commit/PR creation

Example Workflow:

1. Agent reads task from user
2. Agent creates feature branch (git checkout -b feature/name)
3. Agent implements feature (writes code, tests, docs)
4. Agent stages changes for review (git add)
5. Agent reports completion with summary of changes
6. AGENT_REVIEW.md agent:
   - Reviews code quality
   - Makes commits
   - Creates PR

Pre-Commit Checklist

  • Code passes pytest (if tests exist)
  • No print() statements (use logging)
  • No bare except: clauses
  • All functions have type hints
  • All public functions have docstrings
  • No TODO comments (create issues instead)
  • Token count checked (if modifying prompts)

Testing Requirements

Unit Test Coverage

Minimum 80% coverage for:

  • Parsing functions
  • Business logic
  • State machines

Integration Tests Required For:

  • API endpoints
  • Tool execution
  • File operations
  • Network calls (mocked)

Test File Structure

tests/
├── unit/
│   ├── test_parser.py
│   ├── test_executor.py
│   └── test_consensus.py
├── integration/
│   ├── test_api.py
│   └── test_tools.py
└── fixtures/
    └── sample_responses.json

Code Quality Standards

Python Style

  • Follow PEP 8
  • Use type hints for all function signatures
  • Max line length: 100 characters
  • Max function length: 50 lines
  • Max file length: 300 lines (split if larger)

Imports (Order Matters)

# 1. Standard library
import os
import sys
from typing import List

# 2. Third party
import numpy as np
from fastapi import APIRouter

# 3. Local (absolute imports only)
from src.tools.executor import ToolExecutor
from src.swarm.manager import SwarmManager

Documentation Standards

Every module must have:

"""Module purpose in one line.

Longer description if needed (2-3 sentences max).
"""

Every public function must have:

def process_data(data: dict, options: Optional[dict] = None) -> Result:
    """Process data with given options.
    
    Args:
        data: Input data to process
        options: Processing options (default: None)
        
    Returns:
        Processed result
        
    Raises:
        ValueError: If data is invalid
    """

Architecture Rules

No Feature Flags in Core Logic

# BAD
if config.get("USE_NEW_PARSER", False):
    result = new_parser(text)
else:
    result = old_parser(text)

# GOOD: Pick one, remove the other
def parse_tool_calls(text: str) -> Tuple[str, List[dict]]:
    """Parse tool calls from text."""
    # Single implementation

No Code Duplication

  • If you copy-paste > 3 lines, extract to function
  • Shared code goes in src/common/ or src/utils/

Separation of Concerns

src/
├── parser/       # Only parsing logic
├── executor/     # Only execution logic
├── formatter/    # Only formatting/output
└── integration/  # Only API glue code

Forbidden Patterns

Never Do These:

  1. Bare except clauses - Always catch specific exceptions
  2. Production debugging - No print(), use logging
  3. Multiple return formats - One function = one return type
  4. Silent failures - Always log/report errors
  5. Magic numbers - Use named constants
  6. Global state - Use dependency injection
  7. Deep nesting - Max 3 levels of indentation
  8. Circular dependencies - Re-architect if needed

Review Preparation

Before marking PR ready:

  1. Self-Review Checklist (check each item):

    • Tests pass: pytest -v
    • Type checking: mypy src/
    • Linting: ruff check src/
    • Formatting: black src/
    • Token count verified (if applicable)
    • No debug code left in
    • Commit messages follow format
    • Documentation updated
  2. PR Description Template:

    ## Changes
    - [Brief description]
    
    ## Testing
    - [How you tested it]
    
    ## Token Impact (if applicable)
    - Before: X tokens
    - After: Y tokens
    - Change: +/- Z tokens
    
    ## Checklist
    - [ ] Tests added/updated
    - [ ] Documentation updated
    - [ ] Self-review completed
    
  3. Run Final Verification:

    # Run all checks
    pytest && mypy src/ && ruff check src/ && black --check src/
    

Continuous Learning & Research

You MUST periodically research best practices and alternative implementations. This prevents stagnation and ensures we're using proven approaches.

When to Research

Before Major Features:

  • Spend 15-30 minutes researching similar implementations
  • Check: GitHub, Stack Overflow, official docs, research papers
  • Document findings in PR description

Monthly Reviews:

  • Review project's core technologies for updates
  • Check if better libraries/algorithms exist
  • Look for deprecated patterns we're using

When Stuck:

  • Don't brute force a solution
  • Research how others solved similar problems
  • Consider if problem indicates architectural issue

What to Research

1. Best Practices

# Search queries to use:
"python async best practices 2024"
"fastapi error handling patterns"
"LLM consensus voting algorithms"
"gguf quantization comparison"

2. Similar Implementations

  • Search GitHub for similar projects
  • Read their architecture decisions
  • Check their issues for pitfalls they hit
  • Note: Don't copy code blindly, understand WHY

3. Research Papers & Benchmarks

  • For consensus algorithms
  • For quantization strategies
  • For context window optimization
  • For distributed systems patterns

4. Library Updates

  • Check CHANGELOG of major dependencies
  • Review migration guides
  • Test new features in separate branch

Documentation of Research

Create research/YYYY-MM-DD-topic.md for significant findings:

# Research: [Topic]

**Date:** YYYY-MM-DD
**Researcher:** [Name]
**Trigger:** [Why researched this]

## Findings

### Option 1: [Name]
- Source: [Link]
- Pros: ...
- Cons: ...
- Complexity: Low/Medium/High

### Option 2: [Name]
- Source: [Link]
- Pros: ...
- Cons: ...
- Complexity: Low/Medium/High

## Recommendation
[Which option and WHY]

## Implementation Notes
[Specific code changes needed]

## Risks
[What could go wrong]

Research Checklist

Before implementing:

  • Searched for similar open-source implementations
  • Checked recent best practices (2023+)
  • Looked for benchmarking data if applicable
  • Reviewed alternative approaches
  • Considered long-term maintenance implications

After implementing:

  • Documented why chosen approach was selected
  • Added comments linking to research sources
  • Created test comparing against alternatives (if applicable)

Example Research Topics

Immediate:

  • "Python type hints best practices 2024"
  • "FastAPI dependency injection patterns"
  • "LLM tool use format comparison"

Short-term:

  • "Consensus algorithms for distributed LLM systems"
  • "Context window compression techniques"
  • "GGUF quantization vs other formats"

Long-term:

  • "Speculative decoding implementation"
  • "PagedAttention for multiple workers"
  • "RAG integration patterns"

Research Sources

Reliable:

  • Official documentation (Python, FastAPI, etc.)
  • Well-maintained GitHub repos (>1k stars, active)
  • Recent conference talks (PyCon, NeurIPS, etc.)
  • Research papers with code (Papers With Code)
  • Official blogs (Python.org, FastAPI.tiangolo.com)

Use with Caution:

  • Medium articles (variable quality)
  • Old Stack Overflow answers (>2 years)
  • Tutorial sites (often outdated)
  • YouTube videos (hard to verify)

Integration with Development

Weekly:

  • Spend 30 minutes reading about one technology we use
  • Note any improvements we could make
  • Create issues for promising findings

Monthly:

  • Review all open research issues
  • Prioritize based on impact vs effort
  • Schedule implementation of high-value items

Quarterly:

  • Architecture review: Are our patterns still best?
  • Dependency audit: Updates needed?
  • Performance review: Could we be faster?

Remember:

  • Research prevents reinvention of the wheel
  • But don't research forever - timebox it (30 min max for most decisions)
  • Document findings so others don't repeat the research
  • Apply critical thinking - "best practice" depends on context

Breaking This Ruleset

If you MUST break a rule:

  1. Document WHY in code comments
  2. Get explicit approval in PR
  3. Create follow-up issue to fix properly
  4. Never break Rule 3 (No Production Debugging)

Remember: Quality over speed. A fix that takes 2 days with tests is better than a fix that takes 2 hours and breaks 3 other things.