d22c52ec04
- AGENT_WORKER.md: Added Rule 3 for minimal, maintainable, modular code - AGENT_REVIEW.md: Added strict enforcement check in Phase 2 - Emphasizes single responsibility, clean interfaces, and production quality - Reviewers must block code that doesn't meet these standards
441 lines
12 KiB
Markdown
441 lines
12 KiB
Markdown
# Agent Reviewer Rules
|
|
|
|
> **⚠️ IMPORTANT:** This document is for REVIEW AGENTS who handle commits, PRs, and code reviews.
|
|
> Regular agents follow AGENT_WORKER.md for implementation tasks and DO NOT make commits.
|
|
|
|
## Review Philosophy
|
|
|
|
**Mission:** Prevent the circular development patterns identified in commit history.
|
|
|
|
**Standards:**
|
|
- Reject code that doesn't meet quality bar
|
|
- Ask for tests, don't accept "I'll add them later"
|
|
- Check token counts for prompt changes
|
|
- Verify architectural consistency
|
|
- Demand clear error messages
|
|
|
|
**Reviewer Authority:**
|
|
- Can block PR for: missing tests, token bloat, architecture violations
|
|
- Cannot approve own code
|
|
- Must provide constructive feedback with specific fixes
|
|
|
|
## Review Checklist
|
|
|
|
### Phase 1: Structure & Hygiene (Block if failed)
|
|
|
|
- [ ] **Branch naming follows convention**
|
|
- Format: `type/description` (e.g., `fix/tool-parsing`)
|
|
- Not: `quick-fix`, `temp-branch`, `dev`
|
|
|
|
- [ ] **Commit messages are clear**
|
|
- Format: `type(scope): description`
|
|
- No: `fix stuff`, `WIP`, `asdf`, `omg finally`
|
|
- Each commit should be reviewable independently
|
|
|
|
- [ ] **No production debugging code**
|
|
- Search for: `print(`, `console.log`, `debugger`, `TODO`, `FIXME`, `XXX`
|
|
- Check: No commented-out code blocks
|
|
- Check: No temporary files committed
|
|
|
|
- [ ] **Git history is clean**
|
|
- No "fix typo" commits after initial commit
|
|
- No "WIP" commits in PR
|
|
- No merge commits (rebase instead)
|
|
- Squash fixup commits
|
|
|
|
### Phase 2: Code Quality (Block if failed)
|
|
|
|
- [ ] **Tests exist and pass**
|
|
- Unit tests for new functions
|
|
- Integration tests for API changes
|
|
- Run: `pytest -v` (must pass)
|
|
- Coverage: ≥80% for new code
|
|
- **BLOCKING:** No tests = No merge
|
|
|
|
- [ ] **Type hints present**
|
|
- All function parameters typed
|
|
- All return values typed
|
|
- Run: `mypy src/` (must pass with zero errors)
|
|
|
|
- [ ] **No code smells**
|
|
- No functions > 50 lines
|
|
- No files > 300 lines
|
|
- No indentation > 3 levels deep
|
|
- No circular imports
|
|
- No duplicate code (>3 lines copied)
|
|
|
|
- [ ] **Minimal, Maintainable, Modular Code**
|
|
- **Minimal:** Only code needed to solve the problem, no over-engineering
|
|
- **Maintainable:** Clear names, self-documenting, consistent style
|
|
- **Modular:** Single Responsibility Principle, loose coupling, clear interfaces
|
|
- **STRICT ENFORCEMENT:**
|
|
- Functions should do ONE thing (if it does 2+ things, break it up)
|
|
- No monolithic blocks (>50 lines in one function)
|
|
- Clear separation of concerns
|
|
- Interfaces between modules are stable and well-defined
|
|
- Easy to understand for new maintainers
|
|
- No "temp" or "quick" solutions - production quality only
|
|
- **BLOCKING:** Code that is too complex, monolithic, or poorly structured must be rejected
|
|
|
|
- [ ] **Error handling is robust**
|
|
- No bare `except:` clauses
|
|
- All errors have clear messages
|
|
- No silent failures
|
|
- Edge cases handled
|
|
|
|
- [ ] **Documentation is adequate**
|
|
- All public functions have docstrings
|
|
- Complex logic has inline comments
|
|
- README updated if user-facing change
|
|
- Architecture doc updated if pattern changes
|
|
|
|
### Phase 3: Token Budget (Block if failed)
|
|
|
|
**For any prompt/instruction changes:**
|
|
|
|
- [ ] **Token count documented**
|
|
- Before: X tokens
|
|
- After: Y tokens
|
|
- Change: +/- Z tokens
|
|
|
|
- [ ] **Within budget**
|
|
- System prompt + instructions ≤ 2000 tokens (HARD LIMIT)
|
|
- Leaves ≥ 50% context window for user input
|
|
- **BLOCKING:** Over budget = Request reduction
|
|
|
|
- [ ] **Efficient wording**
|
|
- No redundant examples
|
|
- No verbose explanations
|
|
- Prefer code over prose
|
|
|
|
**Token Counting Command:**
|
|
```bash
|
|
# Count tokens in a string
|
|
echo "Your prompt here" | python -c "import sys; import tiktoken; enc = tiktoken.get_encoding('cl100k_base'); print(len(enc.encode(sys.stdin.read())))"
|
|
```
|
|
|
|
### Phase 4: Architecture (Block if failed)
|
|
|
|
- [ ] **Consistent with ARCHITECTURE.md**
|
|
- No new patterns without updating docs
|
|
- No mixing of concerns
|
|
- Follows existing module structure
|
|
|
|
- [ ] **No architecture changes in fixes**
|
|
- Bug fixes should not refactor
|
|
- Refactors should be separate PRs
|
|
- **Exception:** If fix requires arch change, document WHY
|
|
|
|
- [ ] **Parser rules**
|
|
- Only ONE parser per format
|
|
- No alternative parsing paths
|
|
- Clear regex patterns
|
|
- Handles all documented cases
|
|
|
|
- [ ] **No feature flags in core**
|
|
- Code should not have `if config.get("ENABLE_X"):`
|
|
- Pick one approach, remove old one
|
|
- A/B testing only in separate branch
|
|
|
|
### Phase 5: Research & Continuous Learning
|
|
|
|
**For significant changes (>100 lines or new algorithms):**
|
|
|
|
- [ ] **Research documented**
|
|
- Check `research/` folder for related findings
|
|
- PR description mentions alternatives considered
|
|
- Links to sources (docs, papers, repos)
|
|
- Not: "I thought this would work"
|
|
- Yes: "Based on [source], this approach handles [case] better than [alternative]"
|
|
|
|
- [ ] **Best practices followed**
|
|
- Implementation matches current language/framework conventions
|
|
- No deprecated patterns
|
|
- Modern Python features used appropriately (3.9+)
|
|
|
|
- [ ] **No reinvention**
|
|
- Check if standard library solves the problem
|
|
- Check if well-maintained package exists
|
|
- If custom implementation needed, document WHY
|
|
|
|
**Research Documentation Requirements:**
|
|
```markdown
|
|
## Research
|
|
- Alternatives considered: [list]
|
|
- Sources: [links]
|
|
- Decision: [why chosen approach]
|
|
- Benchmarks: [if applicable]
|
|
```
|
|
|
|
### Phase 6: Logic Correctness
|
|
|
|
- [ ] **Logic is sound**
|
|
- Read through the code
|
|
- Check edge cases
|
|
- Verify error conditions
|
|
- Question anything unclear
|
|
|
|
- [ ] **No performance regressions**
|
|
- No blocking I/O in async functions (unless wrapped)
|
|
- No memory leaks
|
|
- No N+1 queries
|
|
- Reasonable algorithmic complexity
|
|
|
|
- [ ] **Security check**
|
|
- No SQL injection vectors
|
|
- No command injection (bash execution sanitized)
|
|
- Path traversal protection (for file ops)
|
|
- No secrets in code
|
|
|
|
## Review Report Format
|
|
|
|
After review, write a report to `reports/PR-{number}-{branch}.md`:
|
|
|
|
```markdown
|
|
# Review Report: PR #{number} - {branch}
|
|
|
|
**Reviewer:** {your name}
|
|
**Date:** {YYYY-MM-DD}
|
|
**Status:** [APPROVED / CHANGES_REQUESTED / BLOCKED]
|
|
|
|
## Summary
|
|
Brief description of what this PR does and overall quality assessment.
|
|
|
|
## Detailed Findings
|
|
|
|
### ✅ Passed
|
|
- [List items that passed review]
|
|
- [Be specific: "Tests cover 85% of new code"]
|
|
|
|
### ⚠️ Warnings (Non-blocking)
|
|
- [Minor issues that don't block merge]
|
|
- [Style suggestions]
|
|
- [Future improvements]
|
|
|
|
### ❌ Blockers (Must fix)
|
|
1. **[Category]** [Specific issue]
|
|
- **Location:** `file.py:123`
|
|
- **Problem:** [What's wrong]
|
|
- **Fix:** [Exactly what to change]
|
|
- **Why:** [Why this matters]
|
|
|
|
2. **[Category]** [Specific issue]
|
|
- ...
|
|
|
|
## Token Impact Analysis
|
|
- Component: [what changed]
|
|
- Before: [X] tokens
|
|
- After: [Y] tokens
|
|
- Impact: [+/- Z] tokens
|
|
- Within budget: [Yes/No]
|
|
|
|
## Test Coverage
|
|
- New code coverage: [X]%
|
|
- Tests pass: [Yes/No]
|
|
- Integration tests: [Present/Missing]
|
|
|
|
## Architecture Review
|
|
- Follows existing patterns: [Yes/No]
|
|
- Introduces new dependencies: [List if any]
|
|
- Breaking changes: [Yes/No - explain if yes]
|
|
|
|
## Research Review
|
|
- Alternatives considered: [Listed/None]
|
|
- Sources cited: [Yes/No]
|
|
- Best practices followed: [Yes/No]
|
|
- Research documented: [Yes/No - location]
|
|
|
|
## Code Quality Score
|
|
- Structure: [0-10]
|
|
- Testing: [0-10]
|
|
- Documentation: [0-10]
|
|
- Logic: [0-10]
|
|
- **Overall: [0-10]**
|
|
|
|
## Action Items
|
|
- [ ] [Specific fix needed]
|
|
- [ ] [Specific fix needed]
|
|
- [ ] [Test to add]
|
|
|
|
## Verdict
|
|
[APPROVED / CHANGES_REQUESTED / BLOCKED]
|
|
|
|
**If CHANGES_REQUESTED:**
|
|
- Address all blockers
|
|
- Re-request review when ready
|
|
|
|
**If BLOCKED:**
|
|
- Major issues require architecture discussion
|
|
- Schedule meeting before continuing
|
|
```
|
|
|
|
## Severity Levels
|
|
|
|
### 🔴 BLOCKING (Cannot merge)
|
|
- Missing tests for new functionality
|
|
- Token budget exceeded
|
|
- Bare `except:` clauses
|
|
- Production debugging code (`print` statements)
|
|
- Breaking changes without documentation
|
|
- Security vulnerabilities
|
|
- Tests failing
|
|
- Type check errors
|
|
- Architecture violations
|
|
|
|
### 🟡 CHANGES_REQUESTED (Fix before merge)
|
|
- Unclear variable names
|
|
- Missing docstrings
|
|
- Inefficient algorithms
|
|
- Missing error handling
|
|
- Unclear commit messages
|
|
- Minor style issues
|
|
|
|
### 🟢 APPROVED (Optional suggestions)
|
|
- Style preferences
|
|
- Future improvements
|
|
- Optional refactors
|
|
|
|
## Common Issues to Watch For
|
|
|
|
### Issue 1: Tool Parsing Duplication
|
|
```python
|
|
# ❌ WRONG - Multiple parsers
|
|
def parse_tools_v1(text): ...
|
|
def parse_tools_v2(text): ...
|
|
def parse_tools_legacy(text): ...
|
|
|
|
# ✅ CORRECT - Single parser
|
|
TOOL_PATTERN = r'TOOL:\s*(\w+)\s*\nARGUMENTS:\s*(\{[^}]*\})'
|
|
```
|
|
|
|
**Check:** Search for "def parse" - should be ONE per format.
|
|
|
|
### Issue 2: Token Bloat
|
|
```python
|
|
# ❌ WRONG - Too verbose
|
|
SYSTEM_PROMPT = """
|
|
You are an AI assistant. Here are detailed instructions...
|
|
[2000 words of explanation]
|
|
[10 examples]
|
|
"""
|
|
|
|
# ✅ CORRECT - Concise
|
|
SYSTEM_PROMPT = """Use TOOL: name\nARGUMENTS: {...} format. Available: read, write, bash."""
|
|
```
|
|
|
|
**Check:** Count tokens, verify < 2000.
|
|
|
|
### Issue 3: Architecture Drift
|
|
```python
|
|
# ❌ WRONG - Mixing concerns in one file
|
|
# src/api/routes.py
|
|
def handle_request(): ...
|
|
def parse_tools(): ...
|
|
def execute_tool(): ...
|
|
def format_response(): ...
|
|
|
|
# ✅ CORRECT - Separated
|
|
# src/api/routes.py - only HTTP handling
|
|
# src/tools/parser.py - only parsing
|
|
# src/tools/executor.py - only execution
|
|
```
|
|
|
|
**Check:** Each module has ONE responsibility.
|
|
|
|
### Issue 4: Debug Code Left In
|
|
```python
|
|
# ❌ WRONG
|
|
def process(data):
|
|
print(f"DEBUG: data={data}") # REMOVE THIS
|
|
result = transform(data)
|
|
print(f"DEBUG: result={result}") # REMOVE THIS
|
|
return result
|
|
|
|
# ✅ CORRECT
|
|
logger = logging.getLogger(__name__)
|
|
|
|
def process(data):
|
|
logger.debug("Processing data", extra={"data_size": len(data)})
|
|
return transform(data)
|
|
```
|
|
|
|
**Check:** `grep -r "print(" src/ --include="*.py" | grep -v "^#"`
|
|
|
|
### Issue 5: Missing Error Context
|
|
```python
|
|
# ❌ WRONG
|
|
raise ValueError("Invalid input")
|
|
|
|
# ✅ CORRECT
|
|
raise ValueError(f"Invalid model format: '{model_str}'. Expected: 'name:size:quant' (e.g., 'qwen:7b:q4')")
|
|
```
|
|
|
|
**Check:** All errors explain what was expected vs received.
|
|
|
|
## Review Workflow
|
|
|
|
1. **First Pass: Structure** (5 min)
|
|
- Check branch name, commits, no debug code
|
|
- If failed → Write report, BLOCK
|
|
|
|
2. **Second Pass: Quality** (10 min)
|
|
- Run tests, check types, review code
|
|
- If failed → Write report, CHANGES_REQUESTED
|
|
|
|
3. **Third Pass: Deep Dive** (15 min)
|
|
- Read logic, check edge cases
|
|
- Verify token counts
|
|
- Check architecture
|
|
- Write detailed report
|
|
|
|
4. **Final Decision** (5 min)
|
|
- APPROVE / CHANGES_REQUESTED / BLOCK
|
|
- Write report to `reports/` folder
|
|
- Post summary in PR comments
|
|
|
|
**Total time per review: 30-35 minutes**
|
|
|
|
## Reviewer Self-Check
|
|
|
|
Before submitting review:
|
|
- [ ] I ran all tests locally
|
|
- [ ] I checked type hints
|
|
- [ ] I counted tokens (if applicable)
|
|
- [ ] I read every line of changed code
|
|
- [ ] My feedback is specific and actionable
|
|
- [ ] I explained WHY for each blocker
|
|
- [ ] I wrote a report to `reports/` folder
|
|
|
|
## Escalation
|
|
|
|
Escalate to architecture discussion if:
|
|
- PR changes core patterns
|
|
- Token budget cannot be met
|
|
- Two reviewers disagree
|
|
- Breaking changes proposed
|
|
|
|
**Don't just approve to be nice.**
|
|
**Don't let technical debt accumulate.**
|
|
|
|
## Report Storage
|
|
|
|
All reports go in `reports/` folder:
|
|
```
|
|
reports/
|
|
├── PR-123-fix-tool-parsing.md
|
|
├── PR-124-add-federation.md
|
|
├── PR-125-refactor-consensus.md
|
|
└── README.md # Index of all reviews
|
|
```
|
|
|
|
**This folder is gitignored - reports stay local.**
|
|
|
|
Generate index with:
|
|
```bash
|
|
ls -1 reports/PR-*.md | sort -t'-' -k2 -n > reports/README.md
|
|
```
|
|
|
|
---
|
|
|
|
**Remember: You're the last line of defense against technical debt. Be thorough, be kind, be strict.**
|