# Agent Worker Rules > **⚠️ IMPORTANT:** This document is for IMPLEMENTATION AGENTS (coding, testing, documentation). > **DO NOT MAKE COMMITS** - that's the AGENT_REVIEW.md agent's job. ## Pre-Flight Checklist (MUST complete before coding) ### ⚠️ GIT OPERATIONS REMINDER **DO NOT make commits.** Commits are ONLY handled by AGENT_REVIEW.md agents. You CAN create branches and stage files (git add), but DO NOT commit (git commit). ### 1. Token Budget Verification - [ ] System prompt + instructions ≤ 2000 tokens (hard limit) - [ ] Leave ≥ 50% of context window for user input - [ ] If adding documentation/examples, remove old ones to maintain budget - [ ] Use `tiktoken` or estimate: ~4 chars = 1 token ### 2. Test Plan Required Before writing ANY code, write a test plan: ```markdown ## Test Plan for [Feature] ### Unit Tests - [ ] Test case 1: [specific input] → [expected output] - [ ] Test case 2: [edge case] - [ ] Test case 3: [error condition] ### Integration Tests - [ ] End-to-end flow: [steps] - [ ] Expected result: [what success looks like] ### Manual Verification - [ ] Command to run: [exact command] - [ ] Expected output: [what to see] ``` ### 3. Design Decision Document For any change > 50 lines: ```markdown ## Design Decision ### Problem [What are we solving?] ### Options Considered 1. [Option A] - Pros: ..., Cons: ... 2. [Option B] - Pros: ..., Cons: ... ### Decision [Which option and WHY] ### Impact - Token count change: [+/- X tokens] - Breaking changes: [Yes/No] - Migration needed: [Yes/No] ``` ## Coding Rules ### Rule 1: One Feature = One Commit **NOTE:** Regular agents DO NOT make commits. AGENT_REVIEW.md agents handle commits. When AGENT_REVIEW.md agents make commits: - Never combine unrelated changes in one commit - If you fix a bug AND refactor, make 2 commits - Commit message format: `type(scope): description` - Types: `feat`, `fix`, `refactor`, `test`, `docs`, `chore` - Example: `feat(tools): add working directory support` ### Rule 2: Tests First (TDD) ```python # BAD: Write code, maybe test later def parse_tools(text): # ... implementation ... pass # GOOD: Write test first def test_parse_simple_tool(): text = 'TOOL: read\nARGUMENTS: {"filePath": "test.txt"}' content, tools = parse_tool_calls(text) assert len(tools) == 1 assert tools[0]["function"]["name"] == "read" # Then write minimal code to pass ``` ### Rule 3: Minimal, Maintainable, Modular Code **Core Focus:** Keep code minimal, maintainable, and modular. #### Minimal - Write only the code needed to solve the problem - Avoid unnecessary abstractions or over-engineering - Keep functions small and focused (max 50 lines) - Prefer simple solutions over complex ones - Remove dead code and unused imports immediately #### Maintainable - Clear, descriptive variable and function names - One concept per file/module - Self-documenting code with minimal comments - Consistent code style throughout - Easy to understand for future maintainers #### Modular - Single Responsibility Principle: One purpose per module/function - Loose coupling between components - Clear, stable interfaces between modules - Easy to test in isolation - Reusable components where appropriate ```python # BAD: Monolithic, complex, hard to maintain def process_user_request(request_data, validate=True, save=True, notify=True, format_output=False): # 200+ lines doing everything validation_result = validate_request(request_data) if validation_result.is_valid: if save: db_connection = get_db_connection() cursor = db_connection.cursor() cursor.execute("INSERT INTO requests ...", request_data) db_connection.commit() if notify: for user in get_users_to_notify(): send_email(user, "Request received") if format_output: return format_as_json(validation_result) return validation_result # GOOD: Minimal, modular, maintainable def validate_request(data: dict) -> ValidationResult: """Validate request data.""" return ValidationResult(is_valid=len(data) > 0) def save_request(data: dict) -> str: """Save request to database.""" return db.insert("requests", data) def notify_users(request_id: str, users: List[str]): """Notify users about request.""" for user in users: send_email(user, f"Request {request_id} received") ``` ### Rule 4: No Production Debugging - NEVER add `print()` statements for debugging - Use `logging` module with appropriate levels - Remove ALL debug logging before committing - Exception: Structured logging for observability (metrics, errors) ```python # BAD def process_request(request): print(f"DEBUG: Got request {request}") # REMOVE THIS result = handle(request) print(f"DEBUG: Result {result}") # REMOVE THIS return result # GOOD def process_request(request): logger.debug("Processing request", extra={"request_id": request.id}) result = handle(request) return result ``` ### Rule 4: Architecture Consistency - Check ARCHITECTURE.md before changing patterns - If unsure, ask in PR description - NEVER change architecture in a "fix" commit - Architecture changes require design doc + team review ### Rule 5: Parse Once, Parse Well - ONE parser per format - If adding new format, remove old one - Parser must handle all documented cases - Parser must fail gracefully (return empty, not crash) ```python # BAD: Multiple parsers for same thing def parse_tools_v1(text): ... def parse_tools_v2(text): ... def parse_tools_legacy(text): ... # GOOD: Single parser with clear regex TOOL_PATTERN = r'TOOL:\s*(\w+)\s*\nARGUMENTS:\s*(\{[^}]*\})' def parse_tool_calls(text: str) -> Tuple[str, List[dict]]: matches = list(re.finditer(TOOL_PATTERN, text, re.IGNORECASE)) if not matches: return text, [] # ... rest of parsing ... ``` ### Rule 6: Token-Aware Documentation - Every docstring/example has a token cost - Count tokens before adding - If over budget, remove something else - Prioritize: Code clarity > Examples > Explanations ```python # BAD: 150 tokens of fluff def calculate(x, y): """ This function calculates the sum of two numbers. The sum is calculated by using the built-in Python addition operator which adds the values together. Args: x (int): The first number to add y (int): The second number to add Returns: int: The sum of x and y Example: >>> calculate(1, 2) 3 """ return x + y # GOOD: 20 tokens, clear enough def calculate(x: int, y: int) -> int: """Return sum of x and y.""" return x + y ``` ### Rule 7: Clear Error Messages - Every error must tell user EXACTLY what went wrong - Include context: what was expected vs what was received - Suggest fix if possible ```python # BAD raise ValueError("Invalid input") # GOOD raise ValueError(f"Invalid model format: '{model_str}'. Expected: 'name:size:quant' (e.g., 'qwen:7b:q4')") ``` ### Rule 8: No Circular Imports ```python # BAD: src/a.py imports src/b.py, src/b.py imports src/a.py # GOOD: Use dependency injection or move shared code to common module ``` ## Git Workflow Rules ### CRITICAL: Commit Handling **REGULAR AGENTS: DO NOT MAKE COMMITS** - Regular agents do NOT create commits, pull requests, or manage git history - Commits are ONLY handled by agents following AGENT_REVIEW.md guidelines - If you need to commit code, the AGENT_REVIEW.md agent should handle it - Exception: You may manually stage files (git add) for the review agent - **You CAN create and checkout branches** (that's fine) - just don't commit to them ### Branch Strategy **Main Branches (Protected):** - `main` - Production-ready code only - `develop` - Integration branch for features (optional for small projects) **Working Branches (Temporary - AGENT_REVIEW.md ONLY):** ``` feature/description # New features fix/description # Bug fixes refactor/description # Code refactoring hotfix/description # Critical production fixes docs/description # Documentation only experiment/description # Experimental work (may be deleted) ``` **Note:** Regular agents should NOT create branches or handle git operations ### Workflow Steps #### 1. Starting New Work ```bash # ALWAYS start from main git checkout main git pull origin main # Create feature branch git checkout -b feature/description # Push branch to remote immediately git push -u origin feature/description ``` #### 2. During Development ```bash # Commit often (small, logical commits) git add -p # Stage interactively (review each change) git commit -m "feat(scope): description" # Push regularly (backup) git push origin feature/description # Keep up-to-date with main git fetch origin git rebase origin/main # Resolve conflicts immediately ``` #### 3. Before PR (Final Cleanup) ```bash # Interactive rebase to clean history git rebase -i main # Squash these: # - "fix typo" # - "WIP" # - "asdf" # - "omg finally" # - Multiple attempts at same fix # Keep separate: # - Logical feature steps # - Refactoring separate from features # - Test additions separate from code changes ``` #### 4. Creating PR - Push final branch: `git push origin feature/description` - Create PR to `main` (not develop unless project uses git-flow) - Fill PR template completely - Request review from AGENT_REVIEW.md qualified reviewer - Link related issues: `Closes #123`, `Fixes #456` ### Commit Rules **Commit Frequency:** - Commit after each logical step (not just at end of day) - Each commit should leave codebase in working state - "Work in progress" commits OK on feature branches (clean before PR) **Commit Size:** - Max 200 lines changed per commit - Max 5 files changed per commit (unless related) - Each commit reviewable in 5 minutes - Split large changes: ```bash # BAD: One giant commit git commit -am "Add federation + fix bugs + refactor + docs" # GOOD: Separate commits git commit -m "refactor(network): extract peer discovery logic" git commit -m "feat(federation): implement cross-swarm voting" git commit -m "fix(federation): handle peer timeout edge case" git commit -m "docs: update federation architecture docs" ``` **Commit Message Format:** ``` type(scope): subject (50 chars or less) Body (wrap at 72 chars): - Why this change was made - What problem it solves - Any breaking changes or migration notes Refs: #123, #456 ``` **Types:** - `feat`: New feature - `fix`: Bug fix - `refactor`: Code restructuring (no behavior change) - `test`: Adding/updating tests - `docs`: Documentation only - `chore`: Build, dependencies, tooling - `perf`: Performance improvement - `style`: Formatting (no code change) **Subject Rules:** - Use imperative mood: "Add feature" not "Added feature" - No period at end - Lowercase after type - Max 50 characters ### Branch Hygiene **DO:** - Create branch from latest main - Use descriptive branch names - Push branch to remote immediately - Rebase onto main regularly - Delete merged branches - Squash fixup commits before PR **DON'T:** - Commit directly to main - Have long-lived branches (>1 week without rebase) - Include unrelated changes in one branch - Commit broken code (even temporarily) - Force push to shared branches - Merge without review ### Handling Conflicts ```bash # While rebasing git rebase main # Conflicts happen... # Resolve conflicts in files git add git rebase --continue # If messed up, abort git rebase --abort ``` **Conflict Resolution Rules:** 1. Understand both changes before resolving 2. Don't just pick "ours" or "theirs" 3. Test after resolving 4. Commit message should explain resolution ### Emergency Procedures **Committed to wrong branch:** ```bash # Undo last commit (keep changes) git reset HEAD~1 # Stash changes git stash # Switch to correct branch git checkout correct-branch # Apply changes git stash pop # Commit properly git commit -m "..." ``` **Need to undo pushed commit:** ```bash # Revert (creates new commit, safe for shared history) git revert git push origin branch-name # OR if feature branch not shared yet # Reset and force push (DANGEROUS) git reset --hard HEAD~1 git push --force-with-lease origin branch-name ``` ### Release Process **NOTE:** Release process should be handled by AGENT_REVIEW.md agents. ```bash # Create release branch git checkout -b release/v1.2.0 # Bump version, update changelog git commit -m "chore: bump version to 1.2.0" # Tag release git tag -a v1.2.0 -m "Release version 1.2.0" git push origin v1.2.0 # Merge to main git checkout main git merge --no-ff release/v1.2.0 git push origin main # Delete release branch git branch -d release/v1.2.0 ``` ### What Regular Agents Should NOT Do **REGULAR AGENTS DO NOT:** - Make commits (git commit) - Create pull requests - Push to remote repositories - Merge branches - Manage git history (rebase, reset, etc.) - Delete branches **REGULAR AGENTS CAN:** - Create and checkout branches (git checkout -b) - Stage files for review (git add) - Switch between branches **REGULAR AGENTS SHOULD:** - Write code and tests - Run tests locally - Use logging instead of print() - Follow code quality standards - Document changes in code comments or design docs - Hand off completed work to AGENT_REVIEW.md agent for commit/PR creation **Example Workflow:** ``` 1. Agent reads task from user 2. Agent creates feature branch (git checkout -b feature/name) 3. Agent implements feature (writes code, tests, docs) 4. Agent stages changes for review (git add) 5. Agent reports completion with summary of changes 6. AGENT_REVIEW.md agent: - Reviews code quality - Makes commits - Creates PR ``` ### Pre-Commit Checklist - [ ] Code passes `pytest` (if tests exist) - [ ] No `print()` statements (use logging) - [ ] No bare `except:` clauses - [ ] All functions have type hints - [ ] All public functions have docstrings - [ ] No TODO comments (create issues instead) - [ ] Token count checked (if modifying prompts) ## Testing Requirements ### Unit Test Coverage Minimum 80% coverage for: - Parsing functions - Business logic - State machines ### Integration Tests Required For: - API endpoints - Tool execution - File operations - Network calls (mocked) ### Test File Structure ``` tests/ ├── unit/ │ ├── test_parser.py │ ├── test_executor.py │ └── test_consensus.py ├── integration/ │ ├── test_api.py │ └── test_tools.py └── fixtures/ └── sample_responses.json ``` ## Code Quality Standards ### Python Style - Follow PEP 8 - Use type hints for all function signatures - Max line length: 100 characters - Max function length: 50 lines - Max file length: 300 lines (split if larger) ### Imports (Order Matters) ```python # 1. Standard library import os import sys from typing import List # 2. Third party import numpy as np from fastapi import APIRouter # 3. Local (absolute imports only) from src.tools.executor import ToolExecutor from src.swarm.manager import SwarmManager ``` ### Documentation Standards Every module must have: ```python """Module purpose in one line. Longer description if needed (2-3 sentences max). """ ``` Every public function must have: ```python def process_data(data: dict, options: Optional[dict] = None) -> Result: """Process data with given options. Args: data: Input data to process options: Processing options (default: None) Returns: Processed result Raises: ValueError: If data is invalid """ ``` ## Architecture Rules ### No Feature Flags in Core Logic ```python # BAD if config.get("USE_NEW_PARSER", False): result = new_parser(text) else: result = old_parser(text) # GOOD: Pick one, remove the other def parse_tool_calls(text: str) -> Tuple[str, List[dict]]: """Parse tool calls from text.""" # Single implementation ``` ### No Code Duplication - If you copy-paste > 3 lines, extract to function - Shared code goes in `src/common/` or `src/utils/` ### Separation of Concerns ``` src/ ├── parser/ # Only parsing logic ├── executor/ # Only execution logic ├── formatter/ # Only formatting/output └── integration/ # Only API glue code ``` ## Forbidden Patterns ### Never Do These: 1. **Bare except clauses** - Always catch specific exceptions 2. **Production debugging** - No `print()`, use logging 3. **Multiple return formats** - One function = one return type 4. **Silent failures** - Always log/report errors 5. **Magic numbers** - Use named constants 6. **Global state** - Use dependency injection 7. **Deep nesting** - Max 3 levels of indentation 8. **Circular dependencies** - Re-architect if needed ## Review Preparation Before marking PR ready: 1. **Self-Review Checklist** (check each item): - [ ] Tests pass: `pytest -v` - [ ] Type checking: `mypy src/` - [ ] Linting: `ruff check src/` - [ ] Formatting: `black src/` - [ ] Token count verified (if applicable) - [ ] No debug code left in - [ ] Commit messages follow format - [ ] Documentation updated 2. **PR Description Template**: ```markdown ## Changes - [Brief description] ## Testing - [How you tested it] ## Token Impact (if applicable) - Before: X tokens - After: Y tokens - Change: +/- Z tokens ## Checklist - [ ] Tests added/updated - [ ] Documentation updated - [ ] Self-review completed ``` 3. **Run Final Verification**: ```bash # Run all checks pytest && mypy src/ && ruff check src/ && black --check src/ ``` ## Continuous Learning & Research You MUST periodically research best practices and alternative implementations. This prevents stagnation and ensures we're using proven approaches. ### When to Research **Before Major Features:** - Spend 15-30 minutes researching similar implementations - Check: GitHub, Stack Overflow, official docs, research papers - Document findings in PR description **Monthly Reviews:** - Review project's core technologies for updates - Check if better libraries/algorithms exist - Look for deprecated patterns we're using **When Stuck:** - Don't brute force a solution - Research how others solved similar problems - Consider if problem indicates architectural issue ### What to Research **1. Best Practices** ```bash # Search queries to use: "python async best practices 2024" "fastapi error handling patterns" "LLM consensus voting algorithms" "gguf quantization comparison" ``` **2. Similar Implementations** - Search GitHub for similar projects - Read their architecture decisions - Check their issues for pitfalls they hit - Note: Don't copy code blindly, understand WHY **3. Research Papers & Benchmarks** - For consensus algorithms - For quantization strategies - For context window optimization - For distributed systems patterns **4. Library Updates** - Check CHANGELOG of major dependencies - Review migration guides - Test new features in separate branch ### Documentation of Research Create `research/YYYY-MM-DD-topic.md` for significant findings: ```markdown # Research: [Topic] **Date:** YYYY-MM-DD **Researcher:** [Name] **Trigger:** [Why researched this] ## Findings ### Option 1: [Name] - Source: [Link] - Pros: ... - Cons: ... - Complexity: Low/Medium/High ### Option 2: [Name] - Source: [Link] - Pros: ... - Cons: ... - Complexity: Low/Medium/High ## Recommendation [Which option and WHY] ## Implementation Notes [Specific code changes needed] ## Risks [What could go wrong] ``` ### Research Checklist **Before implementing:** - [ ] Searched for similar open-source implementations - [ ] Checked recent best practices (2023+) - [ ] Looked for benchmarking data if applicable - [ ] Reviewed alternative approaches - [ ] Considered long-term maintenance implications **After implementing:** - [ ] Documented why chosen approach was selected - [ ] Added comments linking to research sources - [ ] Created test comparing against alternatives (if applicable) ### Example Research Topics **Immediate:** - "Python type hints best practices 2024" - "FastAPI dependency injection patterns" - "LLM tool use format comparison" **Short-term:** - "Consensus algorithms for distributed LLM systems" - "Context window compression techniques" - "GGUF quantization vs other formats" **Long-term:** - "Speculative decoding implementation" - "PagedAttention for multiple workers" - "RAG integration patterns" ### Research Sources **Reliable:** - Official documentation (Python, FastAPI, etc.) - Well-maintained GitHub repos (>1k stars, active) - Recent conference talks (PyCon, NeurIPS, etc.) - Research papers with code (Papers With Code) - Official blogs (Python.org, FastAPI.tiangolo.com) **Use with Caution:** - Medium articles (variable quality) - Old Stack Overflow answers (>2 years) - Tutorial sites (often outdated) - YouTube videos (hard to verify) ### Integration with Development **Weekly:** - Spend 30 minutes reading about one technology we use - Note any improvements we could make - Create issues for promising findings **Monthly:** - Review all open research issues - Prioritize based on impact vs effort - Schedule implementation of high-value items **Quarterly:** - Architecture review: Are our patterns still best? - Dependency audit: Updates needed? - Performance review: Could we be faster? --- **Remember:** - Research prevents reinvention of the wheel - But don't research forever - timebox it (30 min max for most decisions) - Document findings so others don't repeat the research - Apply critical thinking - "best practice" depends on context --- ## Breaking This Ruleset If you MUST break a rule: 1. Document WHY in code comments 2. Get explicit approval in PR 3. Create follow-up issue to fix properly 4. Never break Rule 3 (No Production Debugging) --- **Remember: Quality over speed. A fix that takes 2 days with tests is better than a fix that takes 2 hours and breaks 3 other things.**