- AGENT_WORKER.md: Added Rule 3 for minimal, maintainable, modular code - AGENT_REVIEW.md: Added strict enforcement check in Phase 2 - Emphasizes single responsibility, clean interfaces, and production quality - Reviewers must block code that doesn't meet these standards
22 KiB
Agent Worker Rules
⚠️ IMPORTANT: This document is for IMPLEMENTATION AGENTS (coding, testing, documentation). DO NOT MAKE COMMITS - that's the AGENT_REVIEW.md agent's job.
Pre-Flight Checklist (MUST complete before coding)
⚠️ GIT OPERATIONS REMINDER
DO NOT make commits. Commits are ONLY handled by AGENT_REVIEW.md agents. You CAN create branches and stage files (git add), but DO NOT commit (git commit).
1. Token Budget Verification
- System prompt + instructions ≤ 2000 tokens (hard limit)
- Leave ≥ 50% of context window for user input
- If adding documentation/examples, remove old ones to maintain budget
- Use
tiktokenor estimate: ~4 chars = 1 token
2. Test Plan Required
Before writing ANY code, write a test plan:
## Test Plan for [Feature]
### Unit Tests
- [ ] Test case 1: [specific input] → [expected output]
- [ ] Test case 2: [edge case]
- [ ] Test case 3: [error condition]
### Integration Tests
- [ ] End-to-end flow: [steps]
- [ ] Expected result: [what success looks like]
### Manual Verification
- [ ] Command to run: [exact command]
- [ ] Expected output: [what to see]
3. Design Decision Document
For any change > 50 lines:
## Design Decision
### Problem
[What are we solving?]
### Options Considered
1. [Option A] - Pros: ..., Cons: ...
2. [Option B] - Pros: ..., Cons: ...
### Decision
[Which option and WHY]
### Impact
- Token count change: [+/- X tokens]
- Breaking changes: [Yes/No]
- Migration needed: [Yes/No]
Coding Rules
Rule 1: One Feature = One Commit
NOTE: Regular agents DO NOT make commits. AGENT_REVIEW.md agents handle commits.
When AGENT_REVIEW.md agents make commits:
- Never combine unrelated changes in one commit
- If you fix a bug AND refactor, make 2 commits
- Commit message format:
type(scope): description- Types:
feat,fix,refactor,test,docs,chore - Example:
feat(tools): add working directory support
- Types:
Rule 2: Tests First (TDD)
# BAD: Write code, maybe test later
def parse_tools(text):
# ... implementation ...
pass
# GOOD: Write test first
def test_parse_simple_tool():
text = 'TOOL: read\nARGUMENTS: {"filePath": "test.txt"}'
content, tools = parse_tool_calls(text)
assert len(tools) == 1
assert tools[0]["function"]["name"] == "read"
# Then write minimal code to pass
Rule 3: Minimal, Maintainable, Modular Code
Core Focus: Keep code minimal, maintainable, and modular.
Minimal
- Write only the code needed to solve the problem
- Avoid unnecessary abstractions or over-engineering
- Keep functions small and focused (max 50 lines)
- Prefer simple solutions over complex ones
- Remove dead code and unused imports immediately
Maintainable
- Clear, descriptive variable and function names
- One concept per file/module
- Self-documenting code with minimal comments
- Consistent code style throughout
- Easy to understand for future maintainers
Modular
- Single Responsibility Principle: One purpose per module/function
- Loose coupling between components
- Clear, stable interfaces between modules
- Easy to test in isolation
- Reusable components where appropriate
# BAD: Monolithic, complex, hard to maintain
def process_user_request(request_data, validate=True, save=True, notify=True, format_output=False):
# 200+ lines doing everything
validation_result = validate_request(request_data)
if validation_result.is_valid:
if save:
db_connection = get_db_connection()
cursor = db_connection.cursor()
cursor.execute("INSERT INTO requests ...", request_data)
db_connection.commit()
if notify:
for user in get_users_to_notify():
send_email(user, "Request received")
if format_output:
return format_as_json(validation_result)
return validation_result
# GOOD: Minimal, modular, maintainable
def validate_request(data: dict) -> ValidationResult:
"""Validate request data."""
return ValidationResult(is_valid=len(data) > 0)
def save_request(data: dict) -> str:
"""Save request to database."""
return db.insert("requests", data)
def notify_users(request_id: str, users: List[str]):
"""Notify users about request."""
for user in users:
send_email(user, f"Request {request_id} received")
Rule 4: No Production Debugging
- NEVER add
print()statements for debugging - Use
loggingmodule with appropriate levels - Remove ALL debug logging before committing
- Exception: Structured logging for observability (metrics, errors)
# BAD
def process_request(request):
print(f"DEBUG: Got request {request}") # REMOVE THIS
result = handle(request)
print(f"DEBUG: Result {result}") # REMOVE THIS
return result
# GOOD
def process_request(request):
logger.debug("Processing request", extra={"request_id": request.id})
result = handle(request)
return result
Rule 4: Architecture Consistency
- Check ARCHITECTURE.md before changing patterns
- If unsure, ask in PR description
- NEVER change architecture in a "fix" commit
- Architecture changes require design doc + team review
Rule 5: Parse Once, Parse Well
- ONE parser per format
- If adding new format, remove old one
- Parser must handle all documented cases
- Parser must fail gracefully (return empty, not crash)
# BAD: Multiple parsers for same thing
def parse_tools_v1(text): ...
def parse_tools_v2(text): ...
def parse_tools_legacy(text): ...
# GOOD: Single parser with clear regex
TOOL_PATTERN = r'TOOL:\s*(\w+)\s*\nARGUMENTS:\s*(\{[^}]*\})'
def parse_tool_calls(text: str) -> Tuple[str, List[dict]]:
matches = list(re.finditer(TOOL_PATTERN, text, re.IGNORECASE))
if not matches:
return text, []
# ... rest of parsing ...
Rule 6: Token-Aware Documentation
- Every docstring/example has a token cost
- Count tokens before adding
- If over budget, remove something else
- Prioritize: Code clarity > Examples > Explanations
# BAD: 150 tokens of fluff
def calculate(x, y):
"""
This function calculates the sum of two numbers.
The sum is calculated by using the built-in Python
addition operator which adds the values together.
Args:
x (int): The first number to add
y (int): The second number to add
Returns:
int: The sum of x and y
Example:
>>> calculate(1, 2)
3
"""
return x + y
# GOOD: 20 tokens, clear enough
def calculate(x: int, y: int) -> int:
"""Return sum of x and y."""
return x + y
Rule 7: Clear Error Messages
- Every error must tell user EXACTLY what went wrong
- Include context: what was expected vs what was received
- Suggest fix if possible
# BAD
raise ValueError("Invalid input")
# GOOD
raise ValueError(f"Invalid model format: '{model_str}'. Expected: 'name:size:quant' (e.g., 'qwen:7b:q4')")
Rule 8: No Circular Imports
# BAD: src/a.py imports src/b.py, src/b.py imports src/a.py
# GOOD: Use dependency injection or move shared code to common module
Git Workflow Rules
CRITICAL: Commit Handling
REGULAR AGENTS: DO NOT MAKE COMMITS
- Regular agents do NOT create commits, pull requests, or manage git history
- Commits are ONLY handled by agents following AGENT_REVIEW.md guidelines
- If you need to commit code, the AGENT_REVIEW.md agent should handle it
- Exception: You may manually stage files (git add) for the review agent
- You CAN create and checkout branches (that's fine) - just don't commit to them
Branch Strategy
Main Branches (Protected):
main- Production-ready code onlydevelop- Integration branch for features (optional for small projects)
Working Branches (Temporary - AGENT_REVIEW.md ONLY):
feature/description # New features
fix/description # Bug fixes
refactor/description # Code refactoring
hotfix/description # Critical production fixes
docs/description # Documentation only
experiment/description # Experimental work (may be deleted)
Note: Regular agents should NOT create branches or handle git operations
Workflow Steps
1. Starting New Work
# ALWAYS start from main
git checkout main
git pull origin main
# Create feature branch
git checkout -b feature/description
# Push branch to remote immediately
git push -u origin feature/description
2. During Development
# Commit often (small, logical commits)
git add -p # Stage interactively (review each change)
git commit -m "feat(scope): description"
# Push regularly (backup)
git push origin feature/description
# Keep up-to-date with main
git fetch origin
git rebase origin/main # Resolve conflicts immediately
3. Before PR (Final Cleanup)
# Interactive rebase to clean history
git rebase -i main
# Squash these:
# - "fix typo"
# - "WIP"
# - "asdf"
# - "omg finally"
# - Multiple attempts at same fix
# Keep separate:
# - Logical feature steps
# - Refactoring separate from features
# - Test additions separate from code changes
4. Creating PR
- Push final branch:
git push origin feature/description - Create PR to
main(not develop unless project uses git-flow) - Fill PR template completely
- Request review from AGENT_REVIEW.md qualified reviewer
- Link related issues:
Closes #123,Fixes #456
Commit Rules
Commit Frequency:
- Commit after each logical step (not just at end of day)
- Each commit should leave codebase in working state
- "Work in progress" commits OK on feature branches (clean before PR)
Commit Size:
- Max 200 lines changed per commit
- Max 5 files changed per commit (unless related)
- Each commit reviewable in 5 minutes
- Split large changes:
# BAD: One giant commit git commit -am "Add federation + fix bugs + refactor + docs" # GOOD: Separate commits git commit -m "refactor(network): extract peer discovery logic" git commit -m "feat(federation): implement cross-swarm voting" git commit -m "fix(federation): handle peer timeout edge case" git commit -m "docs: update federation architecture docs"
Commit Message Format:
type(scope): subject (50 chars or less)
Body (wrap at 72 chars):
- Why this change was made
- What problem it solves
- Any breaking changes or migration notes
Refs: #123, #456
Types:
feat: New featurefix: Bug fixrefactor: Code restructuring (no behavior change)test: Adding/updating testsdocs: Documentation onlychore: Build, dependencies, toolingperf: Performance improvementstyle: Formatting (no code change)
Subject Rules:
- Use imperative mood: "Add feature" not "Added feature"
- No period at end
- Lowercase after type
- Max 50 characters
Branch Hygiene
DO:
- Create branch from latest main
- Use descriptive branch names
- Push branch to remote immediately
- Rebase onto main regularly
- Delete merged branches
- Squash fixup commits before PR
DON'T:
- Commit directly to main
- Have long-lived branches (>1 week without rebase)
- Include unrelated changes in one branch
- Commit broken code (even temporarily)
- Force push to shared branches
- Merge without review
Handling Conflicts
# While rebasing
git rebase main
# Conflicts happen...
# Resolve conflicts in files
git add <resolved-files>
git rebase --continue
# If messed up, abort
git rebase --abort
Conflict Resolution Rules:
- Understand both changes before resolving
- Don't just pick "ours" or "theirs"
- Test after resolving
- Commit message should explain resolution
Emergency Procedures
Committed to wrong branch:
# Undo last commit (keep changes)
git reset HEAD~1
# Stash changes
git stash
# Switch to correct branch
git checkout correct-branch
# Apply changes
git stash pop
# Commit properly
git commit -m "..."
Need to undo pushed commit:
# Revert (creates new commit, safe for shared history)
git revert <commit-hash>
git push origin branch-name
# OR if feature branch not shared yet
# Reset and force push (DANGEROUS)
git reset --hard HEAD~1
git push --force-with-lease origin branch-name
Release Process
NOTE: Release process should be handled by AGENT_REVIEW.md agents.
# Create release branch
git checkout -b release/v1.2.0
# Bump version, update changelog
git commit -m "chore: bump version to 1.2.0"
# Tag release
git tag -a v1.2.0 -m "Release version 1.2.0"
git push origin v1.2.0
# Merge to main
git checkout main
git merge --no-ff release/v1.2.0
git push origin main
# Delete release branch
git branch -d release/v1.2.0
What Regular Agents Should NOT Do
REGULAR AGENTS DO NOT:
- Make commits (git commit)
- Create pull requests
- Push to remote repositories
- Merge branches
- Manage git history (rebase, reset, etc.)
- Delete branches
REGULAR AGENTS CAN:
- Create and checkout branches (git checkout -b)
- Stage files for review (git add)
- Switch between branches
REGULAR AGENTS SHOULD:
- Write code and tests
- Run tests locally
- Use logging instead of print()
- Follow code quality standards
- Document changes in code comments or design docs
- Hand off completed work to AGENT_REVIEW.md agent for commit/PR creation
Example Workflow:
1. Agent reads task from user
2. Agent creates feature branch (git checkout -b feature/name)
3. Agent implements feature (writes code, tests, docs)
4. Agent stages changes for review (git add)
5. Agent reports completion with summary of changes
6. AGENT_REVIEW.md agent:
- Reviews code quality
- Makes commits
- Creates PR
Pre-Commit Checklist
- Code passes
pytest(if tests exist) - No
print()statements (use logging) - No bare
except:clauses - All functions have type hints
- All public functions have docstrings
- No TODO comments (create issues instead)
- Token count checked (if modifying prompts)
Testing Requirements
Unit Test Coverage
Minimum 80% coverage for:
- Parsing functions
- Business logic
- State machines
Integration Tests Required For:
- API endpoints
- Tool execution
- File operations
- Network calls (mocked)
Test File Structure
tests/
├── unit/
│ ├── test_parser.py
│ ├── test_executor.py
│ └── test_consensus.py
├── integration/
│ ├── test_api.py
│ └── test_tools.py
└── fixtures/
└── sample_responses.json
Code Quality Standards
Python Style
- Follow PEP 8
- Use type hints for all function signatures
- Max line length: 100 characters
- Max function length: 50 lines
- Max file length: 300 lines (split if larger)
Imports (Order Matters)
# 1. Standard library
import os
import sys
from typing import List
# 2. Third party
import numpy as np
from fastapi import APIRouter
# 3. Local (absolute imports only)
from src.tools.executor import ToolExecutor
from src.swarm.manager import SwarmManager
Documentation Standards
Every module must have:
"""Module purpose in one line.
Longer description if needed (2-3 sentences max).
"""
Every public function must have:
def process_data(data: dict, options: Optional[dict] = None) -> Result:
"""Process data with given options.
Args:
data: Input data to process
options: Processing options (default: None)
Returns:
Processed result
Raises:
ValueError: If data is invalid
"""
Architecture Rules
No Feature Flags in Core Logic
# BAD
if config.get("USE_NEW_PARSER", False):
result = new_parser(text)
else:
result = old_parser(text)
# GOOD: Pick one, remove the other
def parse_tool_calls(text: str) -> Tuple[str, List[dict]]:
"""Parse tool calls from text."""
# Single implementation
No Code Duplication
- If you copy-paste > 3 lines, extract to function
- Shared code goes in
src/common/orsrc/utils/
Separation of Concerns
src/
├── parser/ # Only parsing logic
├── executor/ # Only execution logic
├── formatter/ # Only formatting/output
└── integration/ # Only API glue code
Forbidden Patterns
Never Do These:
- Bare except clauses - Always catch specific exceptions
- Production debugging - No
print(), use logging - Multiple return formats - One function = one return type
- Silent failures - Always log/report errors
- Magic numbers - Use named constants
- Global state - Use dependency injection
- Deep nesting - Max 3 levels of indentation
- Circular dependencies - Re-architect if needed
Review Preparation
Before marking PR ready:
-
Self-Review Checklist (check each item):
- Tests pass:
pytest -v - Type checking:
mypy src/ - Linting:
ruff check src/ - Formatting:
black src/ - Token count verified (if applicable)
- No debug code left in
- Commit messages follow format
- Documentation updated
- Tests pass:
-
PR Description Template:
## Changes - [Brief description] ## Testing - [How you tested it] ## Token Impact (if applicable) - Before: X tokens - After: Y tokens - Change: +/- Z tokens ## Checklist - [ ] Tests added/updated - [ ] Documentation updated - [ ] Self-review completed -
Run Final Verification:
# Run all checks pytest && mypy src/ && ruff check src/ && black --check src/
Continuous Learning & Research
You MUST periodically research best practices and alternative implementations. This prevents stagnation and ensures we're using proven approaches.
When to Research
Before Major Features:
- Spend 15-30 minutes researching similar implementations
- Check: GitHub, Stack Overflow, official docs, research papers
- Document findings in PR description
Monthly Reviews:
- Review project's core technologies for updates
- Check if better libraries/algorithms exist
- Look for deprecated patterns we're using
When Stuck:
- Don't brute force a solution
- Research how others solved similar problems
- Consider if problem indicates architectural issue
What to Research
1. Best Practices
# Search queries to use:
"python async best practices 2024"
"fastapi error handling patterns"
"LLM consensus voting algorithms"
"gguf quantization comparison"
2. Similar Implementations
- Search GitHub for similar projects
- Read their architecture decisions
- Check their issues for pitfalls they hit
- Note: Don't copy code blindly, understand WHY
3. Research Papers & Benchmarks
- For consensus algorithms
- For quantization strategies
- For context window optimization
- For distributed systems patterns
4. Library Updates
- Check CHANGELOG of major dependencies
- Review migration guides
- Test new features in separate branch
Documentation of Research
Create research/YYYY-MM-DD-topic.md for significant findings:
# Research: [Topic]
**Date:** YYYY-MM-DD
**Researcher:** [Name]
**Trigger:** [Why researched this]
## Findings
### Option 1: [Name]
- Source: [Link]
- Pros: ...
- Cons: ...
- Complexity: Low/Medium/High
### Option 2: [Name]
- Source: [Link]
- Pros: ...
- Cons: ...
- Complexity: Low/Medium/High
## Recommendation
[Which option and WHY]
## Implementation Notes
[Specific code changes needed]
## Risks
[What could go wrong]
Research Checklist
Before implementing:
- Searched for similar open-source implementations
- Checked recent best practices (2023+)
- Looked for benchmarking data if applicable
- Reviewed alternative approaches
- Considered long-term maintenance implications
After implementing:
- Documented why chosen approach was selected
- Added comments linking to research sources
- Created test comparing against alternatives (if applicable)
Example Research Topics
Immediate:
- "Python type hints best practices 2024"
- "FastAPI dependency injection patterns"
- "LLM tool use format comparison"
Short-term:
- "Consensus algorithms for distributed LLM systems"
- "Context window compression techniques"
- "GGUF quantization vs other formats"
Long-term:
- "Speculative decoding implementation"
- "PagedAttention for multiple workers"
- "RAG integration patterns"
Research Sources
Reliable:
- Official documentation (Python, FastAPI, etc.)
- Well-maintained GitHub repos (>1k stars, active)
- Recent conference talks (PyCon, NeurIPS, etc.)
- Research papers with code (Papers With Code)
- Official blogs (Python.org, FastAPI.tiangolo.com)
Use with Caution:
- Medium articles (variable quality)
- Old Stack Overflow answers (>2 years)
- Tutorial sites (often outdated)
- YouTube videos (hard to verify)
Integration with Development
Weekly:
- Spend 30 minutes reading about one technology we use
- Note any improvements we could make
- Create issues for promising findings
Monthly:
- Review all open research issues
- Prioritize based on impact vs effort
- Schedule implementation of high-value items
Quarterly:
- Architecture review: Are our patterns still best?
- Dependency audit: Updates needed?
- Performance review: Could we be faster?
Remember:
- Research prevents reinvention of the wheel
- But don't research forever - timebox it (30 min max for most decisions)
- Document findings so others don't repeat the research
- Apply critical thinking - "best practice" depends on context
Breaking This Ruleset
If you MUST break a rule:
- Document WHY in code comments
- Get explicit approval in PR
- Create follow-up issue to fix properly
- Never break Rule 3 (No Production Debugging)
Remember: Quality over speed. A fix that takes 2 days with tests is better than a fix that takes 2 hours and breaks 3 other things.