Files

T

sleepy d22c52ec04 docs: Add minimal, maintainable, modular code requirements

- AGENT_WORKER.md: Added Rule 3 for minimal, maintainable, modular code
- AGENT_REVIEW.md: Added strict enforcement check in Phase 2
- Emphasizes single responsibility, clean interfaces, and production quality
- Reviewers must block code that doesn't meet these standards

2026-02-25 12:30:18 +01:00

22 KiB

Raw Blame History

Agent Worker Rules

⚠️ IMPORTANT: This document is for IMPLEMENTATION AGENTS (coding, testing, documentation). DO NOT MAKE COMMITS - that's the AGENT_REVIEW.md agent's job.

Pre-Flight Checklist (MUST complete before coding)

⚠️ GIT OPERATIONS REMINDER

DO NOT make commits. Commits are ONLY handled by AGENT_REVIEW.md agents. You CAN create branches and stage files (git add), but DO NOT commit (git commit).

1. Token Budget Verification

System prompt + instructions ≤ 2000 tokens (hard limit)
Leave ≥ 50% of context window for user input
If adding documentation/examples, remove old ones to maintain budget
Use tiktoken or estimate: ~4 chars = 1 token

2. Test Plan Required

Before writing ANY code, write a test plan:

## Test Plan for [Feature]

### Unit Tests
- [ ] Test case 1: [specific input] → [expected output]
- [ ] Test case 2: [edge case]
- [ ] Test case 3: [error condition]

### Integration Tests  
- [ ] End-to-end flow: [steps]
- [ ] Expected result: [what success looks like]

### Manual Verification
- [ ] Command to run: [exact command]
- [ ] Expected output: [what to see]

3. Design Decision Document

For any change > 50 lines:

## Design Decision

### Problem
[What are we solving?]

### Options Considered
1. [Option A] - Pros: ..., Cons: ...
2. [Option B] - Pros: ..., Cons: ...

### Decision
[Which option and WHY]

### Impact
- Token count change: [+/- X tokens]
- Breaking changes: [Yes/No]
- Migration needed: [Yes/No]

Coding Rules

Rule 1: One Feature = One Commit

NOTE: Regular agents DO NOT make commits. AGENT_REVIEW.md agents handle commits.

When AGENT_REVIEW.md agents make commits:

Never combine unrelated changes in one commit
If you fix a bug AND refactor, make 2 commits
Commit message format: type(scope): description
- Types: feat, fix, refactor, test, docs, chore
- Example: feat(tools): add working directory support

Rule 2: Tests First (TDD)

# BAD: Write code, maybe test later
def parse_tools(text):
    # ... implementation ...
    pass

# GOOD: Write test first
def test_parse_simple_tool():
    text = 'TOOL: read\nARGUMENTS: {"filePath": "test.txt"}'
    content, tools = parse_tool_calls(text)
    assert len(tools) == 1
    assert tools[0]["function"]["name"] == "read"

# Then write minimal code to pass

Rule 3: Minimal, Maintainable, Modular Code

Core Focus: Keep code minimal, maintainable, and modular.

Minimal

Write only the code needed to solve the problem
Avoid unnecessary abstractions or over-engineering
Keep functions small and focused (max 50 lines)
Prefer simple solutions over complex ones
Remove dead code and unused imports immediately

Maintainable

Clear, descriptive variable and function names
One concept per file/module
Self-documenting code with minimal comments
Consistent code style throughout
Easy to understand for future maintainers

Modular

Single Responsibility Principle: One purpose per module/function
Loose coupling between components
Clear, stable interfaces between modules
Easy to test in isolation
Reusable components where appropriate

# BAD: Monolithic, complex, hard to maintain
def process_user_request(request_data, validate=True, save=True, notify=True, format_output=False):
    # 200+ lines doing everything
    validation_result = validate_request(request_data)
    if validation_result.is_valid:
        if save:
            db_connection = get_db_connection()
            cursor = db_connection.cursor()
            cursor.execute("INSERT INTO requests ...", request_data)
            db_connection.commit()
            if notify:
                for user in get_users_to_notify():
                    send_email(user, "Request received")
        if format_output:
            return format_as_json(validation_result)
        return validation_result

# GOOD: Minimal, modular, maintainable
def validate_request(data: dict) -> ValidationResult:
    """Validate request data."""
    return ValidationResult(is_valid=len(data) > 0)

def save_request(data: dict) -> str:
    """Save request to database."""
    return db.insert("requests", data)

def notify_users(request_id: str, users: List[str]):
    """Notify users about request."""
    for user in users:
        send_email(user, f"Request {request_id} received")

Rule 4: No Production Debugging

NEVER add print() statements for debugging
Use logging module with appropriate levels
Remove ALL debug logging before committing
Exception: Structured logging for observability (metrics, errors)

# BAD
def process_request(request):
    print(f"DEBUG: Got request {request}")  # REMOVE THIS
    result = handle(request)
    print(f"DEBUG: Result {result}")  # REMOVE THIS
    return result

# GOOD
def process_request(request):
    logger.debug("Processing request", extra={"request_id": request.id})
    result = handle(request)
    return result

Rule 4: Architecture Consistency

Check ARCHITECTURE.md before changing patterns
If unsure, ask in PR description
NEVER change architecture in a "fix" commit
Architecture changes require design doc + team review

Rule 5: Parse Once, Parse Well

ONE parser per format
If adding new format, remove old one
Parser must handle all documented cases
Parser must fail gracefully (return empty, not crash)

# BAD: Multiple parsers for same thing
def parse_tools_v1(text): ...
def parse_tools_v2(text): ...
def parse_tools_legacy(text): ...

# GOOD: Single parser with clear regex
TOOL_PATTERN = r'TOOL:\s*(\w+)\s*\nARGUMENTS:\s*(\{[^}]*\})'

def parse_tool_calls(text: str) -> Tuple[str, List[dict]]:
    matches = list(re.finditer(TOOL_PATTERN, text, re.IGNORECASE))
    if not matches:
        return text, []
    # ... rest of parsing ...

Rule 6: Token-Aware Documentation

Every docstring/example has a token cost
Count tokens before adding
If over budget, remove something else
Prioritize: Code clarity > Examples > Explanations

# BAD: 150 tokens of fluff
def calculate(x, y):
    """
    This function calculates the sum of two numbers.
    
    The sum is calculated by using the built-in Python 
    addition operator which adds the values together.
    
    Args:
        x (int): The first number to add
        y (int): The second number to add
        
    Returns:
        int: The sum of x and y
        
    Example:
        >>> calculate(1, 2)
        3
    """
    return x + y

# GOOD: 20 tokens, clear enough
def calculate(x: int, y: int) -> int:
    """Return sum of x and y."""
    return x + y

Rule 7: Clear Error Messages

Every error must tell user EXACTLY what went wrong
Include context: what was expected vs what was received
Suggest fix if possible

# BAD
raise ValueError("Invalid input")

# GOOD
raise ValueError(f"Invalid model format: '{model_str}'. Expected: 'name:size:quant' (e.g., 'qwen:7b:q4')")

Rule 8: No Circular Imports

# BAD: src/a.py imports src/b.py, src/b.py imports src/a.py

# GOOD: Use dependency injection or move shared code to common module

Git Workflow Rules

CRITICAL: Commit Handling

REGULAR AGENTS: DO NOT MAKE COMMITS

Regular agents do NOT create commits, pull requests, or manage git history
Commits are ONLY handled by agents following AGENT_REVIEW.md guidelines
If you need to commit code, the AGENT_REVIEW.md agent should handle it
Exception: You may manually stage files (git add) for the review agent
You CAN create and checkout branches (that's fine) - just don't commit to them

Branch Strategy

Main Branches (Protected):

main - Production-ready code only
develop - Integration branch for features (optional for small projects)

Working Branches (Temporary - AGENT_REVIEW.md ONLY):

feature/description           # New features
fix/description               # Bug fixes  
refactor/description          # Code refactoring
hotfix/description            # Critical production fixes
docs/description              # Documentation only
experiment/description        # Experimental work (may be deleted)

Note: Regular agents should NOT create branches or handle git operations

Workflow Steps

1. Starting New Work

# ALWAYS start from main
git checkout main
git pull origin main

# Create feature branch
git checkout -b feature/description

# Push branch to remote immediately
git push -u origin feature/description

2. During Development

# Commit often (small, logical commits)
git add -p  # Stage interactively (review each change)
git commit -m "feat(scope): description"

# Push regularly (backup)
git push origin feature/description

# Keep up-to-date with main
git fetch origin
git rebase origin/main  # Resolve conflicts immediately

3. Before PR (Final Cleanup)

# Interactive rebase to clean history
git rebase -i main

# Squash these:
# - "fix typo"
# - "WIP"
# - "asdf"
# - "omg finally"
# - Multiple attempts at same fix

# Keep separate:
# - Logical feature steps
# - Refactoring separate from features
# - Test additions separate from code changes

4. Creating PR

Push final branch: git push origin feature/description
Create PR to main (not develop unless project uses git-flow)
Fill PR template completely
Request review from AGENT_REVIEW.md qualified reviewer
Link related issues: Closes #123, Fixes #456

Commit Rules

Commit Frequency:

Commit after each logical step (not just at end of day)
Each commit should leave codebase in working state
"Work in progress" commits OK on feature branches (clean before PR)

Commit Size:

Max 200 lines changed per commit
Max 5 files changed per commit (unless related)
Each commit reviewable in 5 minutes

Split large changes:

# BAD: One giant commit
git commit -am "Add federation + fix bugs + refactor + docs"

# GOOD: Separate commits
git commit -m "refactor(network): extract peer discovery logic"
git commit -m "feat(federation): implement cross-swarm voting"
git commit -m "fix(federation): handle peer timeout edge case"
git commit -m "docs: update federation architecture docs"

Commit Message Format:

type(scope): subject (50 chars or less)

Body (wrap at 72 chars):
- Why this change was made
- What problem it solves  
- Any breaking changes or migration notes

Refs: #123, #456

Types:

feat: New feature
fix: Bug fix
refactor: Code restructuring (no behavior change)
test: Adding/updating tests
docs: Documentation only
chore: Build, dependencies, tooling
perf: Performance improvement
style: Formatting (no code change)

Subject Rules:

Use imperative mood: "Add feature" not "Added feature"
No period at end
Lowercase after type
Max 50 characters

Branch Hygiene

DO:

Create branch from latest main
Use descriptive branch names
Push branch to remote immediately
Rebase onto main regularly
Delete merged branches
Squash fixup commits before PR

DON'T:

Commit directly to main
Have long-lived branches (>1 week without rebase)
Include unrelated changes in one branch
Commit broken code (even temporarily)
Force push to shared branches
Merge without review

Handling Conflicts

# While rebasing
git rebase main
# Conflicts happen...

# Resolve conflicts in files
git add <resolved-files>
git rebase --continue

# If messed up, abort
git rebase --abort

Conflict Resolution Rules:

Understand both changes before resolving
Don't just pick "ours" or "theirs"
Test after resolving
Commit message should explain resolution

Emergency Procedures

Committed to wrong branch:

# Undo last commit (keep changes)
git reset HEAD~1

# Stash changes
git stash

# Switch to correct branch
git checkout correct-branch

# Apply changes
git stash pop

# Commit properly
git commit -m "..."

Need to undo pushed commit:

# Revert (creates new commit, safe for shared history)
git revert <commit-hash>
git push origin branch-name

# OR if feature branch not shared yet
# Reset and force push (DANGEROUS)
git reset --hard HEAD~1
git push --force-with-lease origin branch-name

Release Process

NOTE: Release process should be handled by AGENT_REVIEW.md agents.

# Create release branch
git checkout -b release/v1.2.0

# Bump version, update changelog
git commit -m "chore: bump version to 1.2.0"

# Tag release
git tag -a v1.2.0 -m "Release version 1.2.0"
git push origin v1.2.0

# Merge to main
git checkout main
git merge --no-ff release/v1.2.0
git push origin main

# Delete release branch
git branch -d release/v1.2.0

What Regular Agents Should NOT Do

REGULAR AGENTS DO NOT:

Make commits (git commit)
Create pull requests
Push to remote repositories
Merge branches
Manage git history (rebase, reset, etc.)
Delete branches

REGULAR AGENTS CAN:

Create and checkout branches (git checkout -b)
Stage files for review (git add)
Switch between branches

REGULAR AGENTS SHOULD:

Write code and tests
Run tests locally
Use logging instead of print()
Follow code quality standards
Document changes in code comments or design docs
Hand off completed work to AGENT_REVIEW.md agent for commit/PR creation

Example Workflow:

1. Agent reads task from user
2. Agent creates feature branch (git checkout -b feature/name)
3. Agent implements feature (writes code, tests, docs)
4. Agent stages changes for review (git add)
5. Agent reports completion with summary of changes
6. AGENT_REVIEW.md agent:
   - Reviews code quality
   - Makes commits
   - Creates PR

Pre-Commit Checklist

Code passes pytest (if tests exist)
No print() statements (use logging)
No bare except: clauses
All functions have type hints
All public functions have docstrings
No TODO comments (create issues instead)
Token count checked (if modifying prompts)

Testing Requirements

Unit Test Coverage

Minimum 80% coverage for:

Parsing functions
Business logic
State machines

Integration Tests Required For:

API endpoints
Tool execution
File operations
Network calls (mocked)

Test File Structure

tests/
├── unit/
│   ├── test_parser.py
│   ├── test_executor.py
│   └── test_consensus.py
├── integration/
│   ├── test_api.py
│   └── test_tools.py
└── fixtures/
    └── sample_responses.json

Code Quality Standards

Python Style

Follow PEP 8
Use type hints for all function signatures
Max line length: 100 characters
Max function length: 50 lines
Max file length: 300 lines (split if larger)

Imports (Order Matters)

# 1. Standard library
import os
import sys
from typing import List

# 2. Third party
import numpy as np
from fastapi import APIRouter

# 3. Local (absolute imports only)
from src.tools.executor import ToolExecutor
from src.swarm.manager import SwarmManager

Documentation Standards

Every module must have:

"""Module purpose in one line.

Longer description if needed (2-3 sentences max).
"""

Every public function must have:

def process_data(data: dict, options: Optional[dict] = None) -> Result:
    """Process data with given options.
    
    Args:
        data: Input data to process
        options: Processing options (default: None)
        
    Returns:
        Processed result
        
    Raises:
        ValueError: If data is invalid
    """

Architecture Rules

No Feature Flags in Core Logic

# BAD
if config.get("USE_NEW_PARSER", False):
    result = new_parser(text)
else:
    result = old_parser(text)

# GOOD: Pick one, remove the other
def parse_tool_calls(text: str) -> Tuple[str, List[dict]]:
    """Parse tool calls from text."""
    # Single implementation

No Code Duplication

If you copy-paste > 3 lines, extract to function
Shared code goes in src/common/ or src/utils/

Separation of Concerns

src/
├── parser/       # Only parsing logic
├── executor/     # Only execution logic
├── formatter/    # Only formatting/output
└── integration/  # Only API glue code

Forbidden Patterns

Never Do These:

Bare except clauses - Always catch specific exceptions
Production debugging - No print(), use logging
Multiple return formats - One function = one return type
Silent failures - Always log/report errors
Magic numbers - Use named constants
Global state - Use dependency injection
Deep nesting - Max 3 levels of indentation
Circular dependencies - Re-architect if needed

Review Preparation

Before marking PR ready:

Self-Review Checklist (check each item):
- Tests pass: pytest -v
- Type checking: mypy src/
- Linting: ruff check src/
- Formatting: black src/
- Token count verified (if applicable)
- No debug code left in
- Commit messages follow format
- Documentation updated

PR Description Template:

## Changes
- [Brief description]

## Testing
- [How you tested it]

## Token Impact (if applicable)
- Before: X tokens
- After: Y tokens
- Change: +/- Z tokens

## Checklist
- [ ] Tests added/updated
- [ ] Documentation updated
- [ ] Self-review completed

Run Final Verification:

# Run all checks
pytest && mypy src/ && ruff check src/ && black --check src/

Continuous Learning & Research

You MUST periodically research best practices and alternative implementations. This prevents stagnation and ensures we're using proven approaches.

When to Research

Before Major Features:

Spend 15-30 minutes researching similar implementations
Check: GitHub, Stack Overflow, official docs, research papers
Document findings in PR description

Monthly Reviews:

Review project's core technologies for updates
Check if better libraries/algorithms exist
Look for deprecated patterns we're using

When Stuck:

Don't brute force a solution
Research how others solved similar problems
Consider if problem indicates architectural issue

What to Research

1. Best Practices

# Search queries to use:
"python async best practices 2024"
"fastapi error handling patterns"
"LLM consensus voting algorithms"
"gguf quantization comparison"

2. Similar Implementations

Search GitHub for similar projects
Read their architecture decisions
Check their issues for pitfalls they hit
Note: Don't copy code blindly, understand WHY

3. Research Papers & Benchmarks

For consensus algorithms
For quantization strategies
For context window optimization
For distributed systems patterns

4. Library Updates

Check CHANGELOG of major dependencies
Review migration guides
Test new features in separate branch

Documentation of Research

Create research/YYYY-MM-DD-topic.md for significant findings:

# Research: [Topic]

**Date:** YYYY-MM-DD
**Researcher:** [Name]
**Trigger:** [Why researched this]

## Findings

### Option 1: [Name]
- Source: [Link]
- Pros: ...
- Cons: ...
- Complexity: Low/Medium/High

### Option 2: [Name]
- Source: [Link]
- Pros: ...
- Cons: ...
- Complexity: Low/Medium/High

## Recommendation
[Which option and WHY]

## Implementation Notes
[Specific code changes needed]

## Risks
[What could go wrong]

Research Checklist

Before implementing:

Searched for similar open-source implementations
Checked recent best practices (2023+)
Looked for benchmarking data if applicable
Reviewed alternative approaches
Considered long-term maintenance implications

After implementing:

Documented why chosen approach was selected
Added comments linking to research sources
Created test comparing against alternatives (if applicable)

Example Research Topics

Immediate:

"Python type hints best practices 2024"
"FastAPI dependency injection patterns"
"LLM tool use format comparison"

Short-term:

"Consensus algorithms for distributed LLM systems"
"Context window compression techniques"
"GGUF quantization vs other formats"

Long-term:

"Speculative decoding implementation"
"PagedAttention for multiple workers"
"RAG integration patterns"

Research Sources

Reliable:

Official documentation (Python, FastAPI, etc.)
Well-maintained GitHub repos (>1k stars, active)
Recent conference talks (PyCon, NeurIPS, etc.)
Research papers with code (Papers With Code)
Official blogs (Python.org, FastAPI.tiangolo.com)

Use with Caution:

Medium articles (variable quality)
Old Stack Overflow answers (>2 years)
Tutorial sites (often outdated)
YouTube videos (hard to verify)

Integration with Development

Weekly:

Spend 30 minutes reading about one technology we use
Note any improvements we could make
Create issues for promising findings

Monthly:

Review all open research issues
Prioritize based on impact vs effort
Schedule implementation of high-value items

Quarterly:

Architecture review: Are our patterns still best?
Dependency audit: Updates needed?
Performance review: Could we be faster?

Remember:

Research prevents reinvention of the wheel
But don't research forever - timebox it (30 min max for most decisions)
Document findings so others don't repeat the research
Apply critical thinking - "best practice" depends on context

Breaking This Ruleset

If you MUST break a rule:

Document WHY in code comments
Get explicit approval in PR
Create follow-up issue to fix properly
Never break Rule 3 (No Production Debugging)

Remember: Quality over speed. A fix that takes 2 days with tests is better than a fix that takes 2 hours and breaks 3 other things.

22 KiB Raw Blame History