# Agent Reviewer Rules > **⚠️ IMPORTANT:** This document is for REVIEW AGENTS who handle commits, PRs, and code reviews. > Regular agents follow AGENT_WORKER.md for implementation tasks and DO NOT make commits. ## Review Philosophy **Mission:** Prevent the circular development patterns identified in commit history. **Standards:** - Reject code that doesn't meet quality bar - Ask for tests, don't accept "I'll add them later" - Check token counts for prompt changes - Verify architectural consistency - Demand clear error messages **Reviewer Authority:** - Can block PR for: missing tests, token bloat, architecture violations - Cannot approve own code - Must provide constructive feedback with specific fixes ## Review Checklist ### Phase 1: Structure & Hygiene (Block if failed) - [ ] **Branch naming follows convention** - Format: `type/description` (e.g., `fix/tool-parsing`) - Not: `quick-fix`, `temp-branch`, `dev` - [ ] **Commit messages are clear** - Format: `type(scope): description` - No: `fix stuff`, `WIP`, `asdf`, `omg finally` - Each commit should be reviewable independently - [ ] **No production debugging code** - Search for: `print(`, `console.log`, `debugger`, `TODO`, `FIXME`, `XXX` - Check: No commented-out code blocks - Check: No temporary files committed - [ ] **Git history is clean** - No "fix typo" commits after initial commit - No "WIP" commits in PR - No merge commits (rebase instead) - Squash fixup commits ### Phase 2: Code Quality (Block if failed) - [ ] **Tests exist and pass** - Unit tests for new functions - Integration tests for API changes - Run: `pytest -v` (must pass) - Coverage: ≥80% for new code - **BLOCKING:** No tests = No merge - [ ] **Type hints present** - All function parameters typed - All return values typed - Run: `mypy src/` (must pass with zero errors) - [ ] **No code smells** - No functions > 50 lines - No files > 300 lines - No indentation > 3 levels deep - No circular imports - No duplicate code (>3 lines copied) - [ ] **Minimal, Maintainable, Modular Code** - **Minimal:** Only code needed to solve the problem, no over-engineering - **Maintainable:** Clear names, self-documenting, consistent style - **Modular:** Single Responsibility Principle, loose coupling, clear interfaces - **STRICT ENFORCEMENT:** - Functions should do ONE thing (if it does 2+ things, break it up) - No monolithic blocks (>50 lines in one function) - Clear separation of concerns - Interfaces between modules are stable and well-defined - Easy to understand for new maintainers - No "temp" or "quick" solutions - production quality only - **BLOCKING:** Code that is too complex, monolithic, or poorly structured must be rejected - [ ] **Error handling is robust** - No bare `except:` clauses - All errors have clear messages - No silent failures - Edge cases handled - [ ] **Documentation is adequate** - All public functions have docstrings - Complex logic has inline comments - README updated if user-facing change - Architecture doc updated if pattern changes ### Phase 3: Token Budget (Block if failed) **For any prompt/instruction changes:** - [ ] **Token count documented** - Before: X tokens - After: Y tokens - Change: +/- Z tokens - [ ] **Within budget** - System prompt + instructions ≤ 2000 tokens (HARD LIMIT) - Leaves ≥ 50% context window for user input - **BLOCKING:** Over budget = Request reduction - [ ] **Efficient wording** - No redundant examples - No verbose explanations - Prefer code over prose **Token Counting Command:** ```bash # Count tokens in a string echo "Your prompt here" | python -c "import sys; import tiktoken; enc = tiktoken.get_encoding('cl100k_base'); print(len(enc.encode(sys.stdin.read())))" ``` ### Phase 4: Architecture (Block if failed) - [ ] **Consistent with ARCHITECTURE.md** - No new patterns without updating docs - No mixing of concerns - Follows existing module structure - [ ] **No architecture changes in fixes** - Bug fixes should not refactor - Refactors should be separate PRs - **Exception:** If fix requires arch change, document WHY - [ ] **Parser rules** - Only ONE parser per format - No alternative parsing paths - Clear regex patterns - Handles all documented cases - [ ] **No feature flags in core** - Code should not have `if config.get("ENABLE_X"):` - Pick one approach, remove old one - A/B testing only in separate branch ### Phase 5: Research & Continuous Learning **For significant changes (>100 lines or new algorithms):** - [ ] **Research documented** - Check `research/` folder for related findings - PR description mentions alternatives considered - Links to sources (docs, papers, repos) - Not: "I thought this would work" - Yes: "Based on [source], this approach handles [case] better than [alternative]" - [ ] **Best practices followed** - Implementation matches current language/framework conventions - No deprecated patterns - Modern Python features used appropriately (3.9+) - [ ] **No reinvention** - Check if standard library solves the problem - Check if well-maintained package exists - If custom implementation needed, document WHY **Research Documentation Requirements:** ```markdown ## Research - Alternatives considered: [list] - Sources: [links] - Decision: [why chosen approach] - Benchmarks: [if applicable] ``` ### Phase 6: Logic Correctness - [ ] **Logic is sound** - Read through the code - Check edge cases - Verify error conditions - Question anything unclear - [ ] **No performance regressions** - No blocking I/O in async functions (unless wrapped) - No memory leaks - No N+1 queries - Reasonable algorithmic complexity - [ ] **Security check** - No SQL injection vectors - No command injection (bash execution sanitized) - Path traversal protection (for file ops) - No secrets in code ## Review Report Format After review, write a report to `reports/PR-{number}-{branch}.md`: ```markdown # Review Report: PR #{number} - {branch} **Reviewer:** {your name} **Date:** {YYYY-MM-DD} **Status:** [APPROVED / CHANGES_REQUESTED / BLOCKED] ## Summary Brief description of what this PR does and overall quality assessment. ## Detailed Findings ### ✅ Passed - [List items that passed review] - [Be specific: "Tests cover 85% of new code"] ### ⚠️ Warnings (Non-blocking) - [Minor issues that don't block merge] - [Style suggestions] - [Future improvements] ### ❌ Blockers (Must fix) 1. **[Category]** [Specific issue] - **Location:** `file.py:123` - **Problem:** [What's wrong] - **Fix:** [Exactly what to change] - **Why:** [Why this matters] 2. **[Category]** [Specific issue] - ... ## Token Impact Analysis - Component: [what changed] - Before: [X] tokens - After: [Y] tokens - Impact: [+/- Z] tokens - Within budget: [Yes/No] ## Test Coverage - New code coverage: [X]% - Tests pass: [Yes/No] - Integration tests: [Present/Missing] ## Architecture Review - Follows existing patterns: [Yes/No] - Introduces new dependencies: [List if any] - Breaking changes: [Yes/No - explain if yes] ## Research Review - Alternatives considered: [Listed/None] - Sources cited: [Yes/No] - Best practices followed: [Yes/No] - Research documented: [Yes/No - location] ## Code Quality Score - Structure: [0-10] - Testing: [0-10] - Documentation: [0-10] - Logic: [0-10] - **Overall: [0-10]** ## Action Items - [ ] [Specific fix needed] - [ ] [Specific fix needed] - [ ] [Test to add] ## Verdict [APPROVED / CHANGES_REQUESTED / BLOCKED] **If CHANGES_REQUESTED:** - Address all blockers - Re-request review when ready **If BLOCKED:** - Major issues require architecture discussion - Schedule meeting before continuing ``` ## Severity Levels ### 🔴 BLOCKING (Cannot merge) - Missing tests for new functionality - Token budget exceeded - Bare `except:` clauses - Production debugging code (`print` statements) - Breaking changes without documentation - Security vulnerabilities - Tests failing - Type check errors - Architecture violations ### 🟡 CHANGES_REQUESTED (Fix before merge) - Unclear variable names - Missing docstrings - Inefficient algorithms - Missing error handling - Unclear commit messages - Minor style issues ### 🟢 APPROVED (Optional suggestions) - Style preferences - Future improvements - Optional refactors ## Common Issues to Watch For ### Issue 1: Tool Parsing Duplication ```python # ❌ WRONG - Multiple parsers def parse_tools_v1(text): ... def parse_tools_v2(text): ... def parse_tools_legacy(text): ... # ✅ CORRECT - Single parser TOOL_PATTERN = r'TOOL:\s*(\w+)\s*\nARGUMENTS:\s*(\{[^}]*\})' ``` **Check:** Search for "def parse" - should be ONE per format. ### Issue 2: Token Bloat ```python # ❌ WRONG - Too verbose SYSTEM_PROMPT = """ You are an AI assistant. Here are detailed instructions... [2000 words of explanation] [10 examples] """ # ✅ CORRECT - Concise SYSTEM_PROMPT = """Use TOOL: name\nARGUMENTS: {...} format. Available: read, write, bash.""" ``` **Check:** Count tokens, verify < 2000. ### Issue 3: Architecture Drift ```python # ❌ WRONG - Mixing concerns in one file # src/api/routes.py def handle_request(): ... def parse_tools(): ... def execute_tool(): ... def format_response(): ... # ✅ CORRECT - Separated # src/api/routes.py - only HTTP handling # src/tools/parser.py - only parsing # src/tools/executor.py - only execution ``` **Check:** Each module has ONE responsibility. ### Issue 4: Debug Code Left In ```python # ❌ WRONG def process(data): print(f"DEBUG: data={data}") # REMOVE THIS result = transform(data) print(f"DEBUG: result={result}") # REMOVE THIS return result # ✅ CORRECT logger = logging.getLogger(__name__) def process(data): logger.debug("Processing data", extra={"data_size": len(data)}) return transform(data) ``` **Check:** `grep -r "print(" src/ --include="*.py" | grep -v "^#"` ### Issue 5: Missing Error Context ```python # ❌ WRONG raise ValueError("Invalid input") # ✅ CORRECT raise ValueError(f"Invalid model format: '{model_str}'. Expected: 'name:size:quant' (e.g., 'qwen:7b:q4')") ``` **Check:** All errors explain what was expected vs received. ## Review Workflow 1. **First Pass: Structure** (5 min) - Check branch name, commits, no debug code - If failed → Write report, BLOCK 2. **Second Pass: Quality** (10 min) - Run tests, check types, review code - If failed → Write report, CHANGES_REQUESTED 3. **Third Pass: Deep Dive** (15 min) - Read logic, check edge cases - Verify token counts - Check architecture - Write detailed report 4. **Final Decision** (5 min) - APPROVE / CHANGES_REQUESTED / BLOCK - Write report to `reports/` folder - Post summary in PR comments **Total time per review: 30-35 minutes** ## Reviewer Self-Check Before submitting review: - [ ] I ran all tests locally - [ ] I checked type hints - [ ] I counted tokens (if applicable) - [ ] I read every line of changed code - [ ] My feedback is specific and actionable - [ ] I explained WHY for each blocker - [ ] I wrote a report to `reports/` folder ## Escalation Escalate to architecture discussion if: - PR changes core patterns - Token budget cannot be met - Two reviewers disagree - Breaking changes proposed **Don't just approve to be nice.** **Don't let technical debt accumulate.** ## Report Storage All reports go in `reports/` folder: ``` reports/ ├── PR-123-fix-tool-parsing.md ├── PR-124-add-federation.md ├── PR-125-refactor-consensus.md └── README.md # Index of all reviews ``` **This folder is gitignored - reports stay local.** Generate index with: ```bash ls -1 reports/PR-*.md | sort -t'-' -k2 -n > reports/README.md ``` --- **Remember: You're the last line of defense against technical debt. Be thorough, be kind, be strict.**