Files
local_swarm/AGENT_REVIEW.md
T
sleepy d22c52ec04 docs: Add minimal, maintainable, modular code requirements
- AGENT_WORKER.md: Added Rule 3 for minimal, maintainable, modular code
- AGENT_REVIEW.md: Added strict enforcement check in Phase 2
- Emphasizes single responsibility, clean interfaces, and production quality
- Reviewers must block code that doesn't meet these standards
2026-02-25 12:30:18 +01:00

12 KiB

Agent Reviewer Rules

⚠️ IMPORTANT: This document is for REVIEW AGENTS who handle commits, PRs, and code reviews. Regular agents follow AGENT_WORKER.md for implementation tasks and DO NOT make commits.

Review Philosophy

Mission: Prevent the circular development patterns identified in commit history.

Standards:

  • Reject code that doesn't meet quality bar
  • Ask for tests, don't accept "I'll add them later"
  • Check token counts for prompt changes
  • Verify architectural consistency
  • Demand clear error messages

Reviewer Authority:

  • Can block PR for: missing tests, token bloat, architecture violations
  • Cannot approve own code
  • Must provide constructive feedback with specific fixes

Review Checklist

Phase 1: Structure & Hygiene (Block if failed)

  • Branch naming follows convention

    • Format: type/description (e.g., fix/tool-parsing)
    • Not: quick-fix, temp-branch, dev
  • Commit messages are clear

    • Format: type(scope): description
    • No: fix stuff, WIP, asdf, omg finally
    • Each commit should be reviewable independently
  • No production debugging code

    • Search for: print(, console.log, debugger, TODO, FIXME, XXX
    • Check: No commented-out code blocks
    • Check: No temporary files committed
  • Git history is clean

    • No "fix typo" commits after initial commit
    • No "WIP" commits in PR
    • No merge commits (rebase instead)
    • Squash fixup commits

Phase 2: Code Quality (Block if failed)

  • Tests exist and pass

    • Unit tests for new functions
    • Integration tests for API changes
    • Run: pytest -v (must pass)
    • Coverage: ≥80% for new code
    • BLOCKING: No tests = No merge
  • Type hints present

    • All function parameters typed
    • All return values typed
    • Run: mypy src/ (must pass with zero errors)
  • No code smells

    • No functions > 50 lines
    • No files > 300 lines
    • No indentation > 3 levels deep
    • No circular imports
    • No duplicate code (>3 lines copied)
  • Minimal, Maintainable, Modular Code

    • Minimal: Only code needed to solve the problem, no over-engineering
    • Maintainable: Clear names, self-documenting, consistent style
    • Modular: Single Responsibility Principle, loose coupling, clear interfaces
    • STRICT ENFORCEMENT:
      • Functions should do ONE thing (if it does 2+ things, break it up)
      • No monolithic blocks (>50 lines in one function)
      • Clear separation of concerns
      • Interfaces between modules are stable and well-defined
      • Easy to understand for new maintainers
      • No "temp" or "quick" solutions - production quality only
    • BLOCKING: Code that is too complex, monolithic, or poorly structured must be rejected
  • Error handling is robust

    • No bare except: clauses
    • All errors have clear messages
    • No silent failures
    • Edge cases handled
  • Documentation is adequate

    • All public functions have docstrings
    • Complex logic has inline comments
    • README updated if user-facing change
    • Architecture doc updated if pattern changes

Phase 3: Token Budget (Block if failed)

For any prompt/instruction changes:

  • Token count documented

    • Before: X tokens
    • After: Y tokens
    • Change: +/- Z tokens
  • Within budget

    • System prompt + instructions ≤ 2000 tokens (HARD LIMIT)
    • Leaves ≥ 50% context window for user input
    • BLOCKING: Over budget = Request reduction
  • Efficient wording

    • No redundant examples
    • No verbose explanations
    • Prefer code over prose

Token Counting Command:

# Count tokens in a string
echo "Your prompt here" | python -c "import sys; import tiktoken; enc = tiktoken.get_encoding('cl100k_base'); print(len(enc.encode(sys.stdin.read())))"

Phase 4: Architecture (Block if failed)

  • Consistent with ARCHITECTURE.md

    • No new patterns without updating docs
    • No mixing of concerns
    • Follows existing module structure
  • No architecture changes in fixes

    • Bug fixes should not refactor
    • Refactors should be separate PRs
    • Exception: If fix requires arch change, document WHY
  • Parser rules

    • Only ONE parser per format
    • No alternative parsing paths
    • Clear regex patterns
    • Handles all documented cases
  • No feature flags in core

    • Code should not have if config.get("ENABLE_X"):
    • Pick one approach, remove old one
    • A/B testing only in separate branch

Phase 5: Research & Continuous Learning

For significant changes (>100 lines or new algorithms):

  • Research documented

    • Check research/ folder for related findings
    • PR description mentions alternatives considered
    • Links to sources (docs, papers, repos)
    • Not: "I thought this would work"
    • Yes: "Based on [source], this approach handles [case] better than [alternative]"
  • Best practices followed

    • Implementation matches current language/framework conventions
    • No deprecated patterns
    • Modern Python features used appropriately (3.9+)
  • No reinvention

    • Check if standard library solves the problem
    • Check if well-maintained package exists
    • If custom implementation needed, document WHY

Research Documentation Requirements:

## Research
- Alternatives considered: [list]
- Sources: [links]
- Decision: [why chosen approach]
- Benchmarks: [if applicable]

Phase 6: Logic Correctness

  • Logic is sound

    • Read through the code
    • Check edge cases
    • Verify error conditions
    • Question anything unclear
  • No performance regressions

    • No blocking I/O in async functions (unless wrapped)
    • No memory leaks
    • No N+1 queries
    • Reasonable algorithmic complexity
  • Security check

    • No SQL injection vectors
    • No command injection (bash execution sanitized)
    • Path traversal protection (for file ops)
    • No secrets in code

Review Report Format

After review, write a report to reports/PR-{number}-{branch}.md:

# Review Report: PR #{number} - {branch}

**Reviewer:** {your name}
**Date:** {YYYY-MM-DD}
**Status:** [APPROVED / CHANGES_REQUESTED / BLOCKED]

## Summary
Brief description of what this PR does and overall quality assessment.

## Detailed Findings

### ✅ Passed
- [List items that passed review]
- [Be specific: "Tests cover 85% of new code"]

### ⚠️ Warnings (Non-blocking)
- [Minor issues that don't block merge]
- [Style suggestions]
- [Future improvements]

### ❌ Blockers (Must fix)
1. **[Category]** [Specific issue]
   - **Location:** `file.py:123`
   - **Problem:** [What's wrong]
   - **Fix:** [Exactly what to change]
   - **Why:** [Why this matters]

2. **[Category]** [Specific issue]
   - ...

## Token Impact Analysis
- Component: [what changed]
- Before: [X] tokens
- After: [Y] tokens
- Impact: [+/- Z] tokens
- Within budget: [Yes/No]

## Test Coverage
- New code coverage: [X]%
- Tests pass: [Yes/No]
- Integration tests: [Present/Missing]

## Architecture Review
- Follows existing patterns: [Yes/No]
- Introduces new dependencies: [List if any]
- Breaking changes: [Yes/No - explain if yes]

## Research Review
- Alternatives considered: [Listed/None]
- Sources cited: [Yes/No]
- Best practices followed: [Yes/No]
- Research documented: [Yes/No - location]

## Code Quality Score
- Structure: [0-10]
- Testing: [0-10]
- Documentation: [0-10]
- Logic: [0-10]
- **Overall: [0-10]**

## Action Items
- [ ] [Specific fix needed]
- [ ] [Specific fix needed]
- [ ] [Test to add]

## Verdict
[APPROVED / CHANGES_REQUESTED / BLOCKED]

**If CHANGES_REQUESTED:** 
- Address all blockers
- Re-request review when ready

**If BLOCKED:**
- Major issues require architecture discussion
- Schedule meeting before continuing

Severity Levels

🔴 BLOCKING (Cannot merge)

  • Missing tests for new functionality
  • Token budget exceeded
  • Bare except: clauses
  • Production debugging code (print statements)
  • Breaking changes without documentation
  • Security vulnerabilities
  • Tests failing
  • Type check errors
  • Architecture violations

🟡 CHANGES_REQUESTED (Fix before merge)

  • Unclear variable names
  • Missing docstrings
  • Inefficient algorithms
  • Missing error handling
  • Unclear commit messages
  • Minor style issues

🟢 APPROVED (Optional suggestions)

  • Style preferences
  • Future improvements
  • Optional refactors

Common Issues to Watch For

Issue 1: Tool Parsing Duplication

# ❌ WRONG - Multiple parsers
def parse_tools_v1(text): ...
def parse_tools_v2(text): ...
def parse_tools_legacy(text): ...

# ✅ CORRECT - Single parser
TOOL_PATTERN = r'TOOL:\s*(\w+)\s*\nARGUMENTS:\s*(\{[^}]*\})'

Check: Search for "def parse" - should be ONE per format.

Issue 2: Token Bloat

# ❌ WRONG - Too verbose
SYSTEM_PROMPT = """
You are an AI assistant. Here are detailed instructions...
[2000 words of explanation]
[10 examples]
"""

# ✅ CORRECT - Concise
SYSTEM_PROMPT = """Use TOOL: name\nARGUMENTS: {...} format. Available: read, write, bash."""

Check: Count tokens, verify < 2000.

Issue 3: Architecture Drift

# ❌ WRONG - Mixing concerns in one file
# src/api/routes.py
def handle_request(): ...
def parse_tools(): ...
def execute_tool(): ...
def format_response(): ...

# ✅ CORRECT - Separated
# src/api/routes.py - only HTTP handling
# src/tools/parser.py - only parsing
# src/tools/executor.py - only execution

Check: Each module has ONE responsibility.

Issue 4: Debug Code Left In

# ❌ WRONG
def process(data):
    print(f"DEBUG: data={data}")  # REMOVE THIS
    result = transform(data)
    print(f"DEBUG: result={result}")  # REMOVE THIS
    return result

# ✅ CORRECT
logger = logging.getLogger(__name__)

def process(data):
    logger.debug("Processing data", extra={"data_size": len(data)})
    return transform(data)

Check: grep -r "print(" src/ --include="*.py" | grep -v "^#"

Issue 5: Missing Error Context

# ❌ WRONG
raise ValueError("Invalid input")

# ✅ CORRECT
raise ValueError(f"Invalid model format: '{model_str}'. Expected: 'name:size:quant' (e.g., 'qwen:7b:q4')")

Check: All errors explain what was expected vs received.

Review Workflow

  1. First Pass: Structure (5 min)

    • Check branch name, commits, no debug code
    • If failed → Write report, BLOCK
  2. Second Pass: Quality (10 min)

    • Run tests, check types, review code
    • If failed → Write report, CHANGES_REQUESTED
  3. Third Pass: Deep Dive (15 min)

    • Read logic, check edge cases
    • Verify token counts
    • Check architecture
    • Write detailed report
  4. Final Decision (5 min)

    • APPROVE / CHANGES_REQUESTED / BLOCK
    • Write report to reports/ folder
    • Post summary in PR comments

Total time per review: 30-35 minutes

Reviewer Self-Check

Before submitting review:

  • I ran all tests locally
  • I checked type hints
  • I counted tokens (if applicable)
  • I read every line of changed code
  • My feedback is specific and actionable
  • I explained WHY for each blocker
  • I wrote a report to reports/ folder

Escalation

Escalate to architecture discussion if:

  • PR changes core patterns
  • Token budget cannot be met
  • Two reviewers disagree
  • Breaking changes proposed

Don't just approve to be nice. Don't let technical debt accumulate.

Report Storage

All reports go in reports/ folder:

reports/
├── PR-123-fix-tool-parsing.md
├── PR-124-add-federation.md
├── PR-125-refactor-consensus.md
└── README.md  # Index of all reviews

This folder is gitignored - reports stay local.

Generate index with:

ls -1 reports/PR-*.md | sort -t'-' -k2 -n > reports/README.md

Remember: You're the last line of defense against technical debt. Be thorough, be kind, be strict.