Files
sleepy d22c52ec04 docs: Add minimal, maintainable, modular code requirements
- AGENT_WORKER.md: Added Rule 3 for minimal, maintainable, modular code
- AGENT_REVIEW.md: Added strict enforcement check in Phase 2
- Emphasizes single responsibility, clean interfaces, and production quality
- Reviewers must block code that doesn't meet these standards
2026-02-25 12:30:18 +01:00

441 lines
12 KiB
Markdown

# Agent Reviewer Rules
> **⚠️ IMPORTANT:** This document is for REVIEW AGENTS who handle commits, PRs, and code reviews.
> Regular agents follow AGENT_WORKER.md for implementation tasks and DO NOT make commits.
## Review Philosophy
**Mission:** Prevent the circular development patterns identified in commit history.
**Standards:**
- Reject code that doesn't meet quality bar
- Ask for tests, don't accept "I'll add them later"
- Check token counts for prompt changes
- Verify architectural consistency
- Demand clear error messages
**Reviewer Authority:**
- Can block PR for: missing tests, token bloat, architecture violations
- Cannot approve own code
- Must provide constructive feedback with specific fixes
## Review Checklist
### Phase 1: Structure & Hygiene (Block if failed)
- [ ] **Branch naming follows convention**
- Format: `type/description` (e.g., `fix/tool-parsing`)
- Not: `quick-fix`, `temp-branch`, `dev`
- [ ] **Commit messages are clear**
- Format: `type(scope): description`
- No: `fix stuff`, `WIP`, `asdf`, `omg finally`
- Each commit should be reviewable independently
- [ ] **No production debugging code**
- Search for: `print(`, `console.log`, `debugger`, `TODO`, `FIXME`, `XXX`
- Check: No commented-out code blocks
- Check: No temporary files committed
- [ ] **Git history is clean**
- No "fix typo" commits after initial commit
- No "WIP" commits in PR
- No merge commits (rebase instead)
- Squash fixup commits
### Phase 2: Code Quality (Block if failed)
- [ ] **Tests exist and pass**
- Unit tests for new functions
- Integration tests for API changes
- Run: `pytest -v` (must pass)
- Coverage: ≥80% for new code
- **BLOCKING:** No tests = No merge
- [ ] **Type hints present**
- All function parameters typed
- All return values typed
- Run: `mypy src/` (must pass with zero errors)
- [ ] **No code smells**
- No functions > 50 lines
- No files > 300 lines
- No indentation > 3 levels deep
- No circular imports
- No duplicate code (>3 lines copied)
- [ ] **Minimal, Maintainable, Modular Code**
- **Minimal:** Only code needed to solve the problem, no over-engineering
- **Maintainable:** Clear names, self-documenting, consistent style
- **Modular:** Single Responsibility Principle, loose coupling, clear interfaces
- **STRICT ENFORCEMENT:**
- Functions should do ONE thing (if it does 2+ things, break it up)
- No monolithic blocks (>50 lines in one function)
- Clear separation of concerns
- Interfaces between modules are stable and well-defined
- Easy to understand for new maintainers
- No "temp" or "quick" solutions - production quality only
- **BLOCKING:** Code that is too complex, monolithic, or poorly structured must be rejected
- [ ] **Error handling is robust**
- No bare `except:` clauses
- All errors have clear messages
- No silent failures
- Edge cases handled
- [ ] **Documentation is adequate**
- All public functions have docstrings
- Complex logic has inline comments
- README updated if user-facing change
- Architecture doc updated if pattern changes
### Phase 3: Token Budget (Block if failed)
**For any prompt/instruction changes:**
- [ ] **Token count documented**
- Before: X tokens
- After: Y tokens
- Change: +/- Z tokens
- [ ] **Within budget**
- System prompt + instructions ≤ 2000 tokens (HARD LIMIT)
- Leaves ≥ 50% context window for user input
- **BLOCKING:** Over budget = Request reduction
- [ ] **Efficient wording**
- No redundant examples
- No verbose explanations
- Prefer code over prose
**Token Counting Command:**
```bash
# Count tokens in a string
echo "Your prompt here" | python -c "import sys; import tiktoken; enc = tiktoken.get_encoding('cl100k_base'); print(len(enc.encode(sys.stdin.read())))"
```
### Phase 4: Architecture (Block if failed)
- [ ] **Consistent with ARCHITECTURE.md**
- No new patterns without updating docs
- No mixing of concerns
- Follows existing module structure
- [ ] **No architecture changes in fixes**
- Bug fixes should not refactor
- Refactors should be separate PRs
- **Exception:** If fix requires arch change, document WHY
- [ ] **Parser rules**
- Only ONE parser per format
- No alternative parsing paths
- Clear regex patterns
- Handles all documented cases
- [ ] **No feature flags in core**
- Code should not have `if config.get("ENABLE_X"):`
- Pick one approach, remove old one
- A/B testing only in separate branch
### Phase 5: Research & Continuous Learning
**For significant changes (>100 lines or new algorithms):**
- [ ] **Research documented**
- Check `research/` folder for related findings
- PR description mentions alternatives considered
- Links to sources (docs, papers, repos)
- Not: "I thought this would work"
- Yes: "Based on [source], this approach handles [case] better than [alternative]"
- [ ] **Best practices followed**
- Implementation matches current language/framework conventions
- No deprecated patterns
- Modern Python features used appropriately (3.9+)
- [ ] **No reinvention**
- Check if standard library solves the problem
- Check if well-maintained package exists
- If custom implementation needed, document WHY
**Research Documentation Requirements:**
```markdown
## Research
- Alternatives considered: [list]
- Sources: [links]
- Decision: [why chosen approach]
- Benchmarks: [if applicable]
```
### Phase 6: Logic Correctness
- [ ] **Logic is sound**
- Read through the code
- Check edge cases
- Verify error conditions
- Question anything unclear
- [ ] **No performance regressions**
- No blocking I/O in async functions (unless wrapped)
- No memory leaks
- No N+1 queries
- Reasonable algorithmic complexity
- [ ] **Security check**
- No SQL injection vectors
- No command injection (bash execution sanitized)
- Path traversal protection (for file ops)
- No secrets in code
## Review Report Format
After review, write a report to `reports/PR-{number}-{branch}.md`:
```markdown
# Review Report: PR #{number} - {branch}
**Reviewer:** {your name}
**Date:** {YYYY-MM-DD}
**Status:** [APPROVED / CHANGES_REQUESTED / BLOCKED]
## Summary
Brief description of what this PR does and overall quality assessment.
## Detailed Findings
### ✅ Passed
- [List items that passed review]
- [Be specific: "Tests cover 85% of new code"]
### ⚠️ Warnings (Non-blocking)
- [Minor issues that don't block merge]
- [Style suggestions]
- [Future improvements]
### ❌ Blockers (Must fix)
1. **[Category]** [Specific issue]
- **Location:** `file.py:123`
- **Problem:** [What's wrong]
- **Fix:** [Exactly what to change]
- **Why:** [Why this matters]
2. **[Category]** [Specific issue]
- ...
## Token Impact Analysis
- Component: [what changed]
- Before: [X] tokens
- After: [Y] tokens
- Impact: [+/- Z] tokens
- Within budget: [Yes/No]
## Test Coverage
- New code coverage: [X]%
- Tests pass: [Yes/No]
- Integration tests: [Present/Missing]
## Architecture Review
- Follows existing patterns: [Yes/No]
- Introduces new dependencies: [List if any]
- Breaking changes: [Yes/No - explain if yes]
## Research Review
- Alternatives considered: [Listed/None]
- Sources cited: [Yes/No]
- Best practices followed: [Yes/No]
- Research documented: [Yes/No - location]
## Code Quality Score
- Structure: [0-10]
- Testing: [0-10]
- Documentation: [0-10]
- Logic: [0-10]
- **Overall: [0-10]**
## Action Items
- [ ] [Specific fix needed]
- [ ] [Specific fix needed]
- [ ] [Test to add]
## Verdict
[APPROVED / CHANGES_REQUESTED / BLOCKED]
**If CHANGES_REQUESTED:**
- Address all blockers
- Re-request review when ready
**If BLOCKED:**
- Major issues require architecture discussion
- Schedule meeting before continuing
```
## Severity Levels
### 🔴 BLOCKING (Cannot merge)
- Missing tests for new functionality
- Token budget exceeded
- Bare `except:` clauses
- Production debugging code (`print` statements)
- Breaking changes without documentation
- Security vulnerabilities
- Tests failing
- Type check errors
- Architecture violations
### 🟡 CHANGES_REQUESTED (Fix before merge)
- Unclear variable names
- Missing docstrings
- Inefficient algorithms
- Missing error handling
- Unclear commit messages
- Minor style issues
### 🟢 APPROVED (Optional suggestions)
- Style preferences
- Future improvements
- Optional refactors
## Common Issues to Watch For
### Issue 1: Tool Parsing Duplication
```python
# ❌ WRONG - Multiple parsers
def parse_tools_v1(text): ...
def parse_tools_v2(text): ...
def parse_tools_legacy(text): ...
# ✅ CORRECT - Single parser
TOOL_PATTERN = r'TOOL:\s*(\w+)\s*\nARGUMENTS:\s*(\{[^}]*\})'
```
**Check:** Search for "def parse" - should be ONE per format.
### Issue 2: Token Bloat
```python
# ❌ WRONG - Too verbose
SYSTEM_PROMPT = """
You are an AI assistant. Here are detailed instructions...
[2000 words of explanation]
[10 examples]
"""
# ✅ CORRECT - Concise
SYSTEM_PROMPT = """Use TOOL: name\nARGUMENTS: {...} format. Available: read, write, bash."""
```
**Check:** Count tokens, verify < 2000.
### Issue 3: Architecture Drift
```python
# ❌ WRONG - Mixing concerns in one file
# src/api/routes.py
def handle_request(): ...
def parse_tools(): ...
def execute_tool(): ...
def format_response(): ...
# ✅ CORRECT - Separated
# src/api/routes.py - only HTTP handling
# src/tools/parser.py - only parsing
# src/tools/executor.py - only execution
```
**Check:** Each module has ONE responsibility.
### Issue 4: Debug Code Left In
```python
# ❌ WRONG
def process(data):
print(f"DEBUG: data={data}") # REMOVE THIS
result = transform(data)
print(f"DEBUG: result={result}") # REMOVE THIS
return result
# ✅ CORRECT
logger = logging.getLogger(__name__)
def process(data):
logger.debug("Processing data", extra={"data_size": len(data)})
return transform(data)
```
**Check:** `grep -r "print(" src/ --include="*.py" | grep -v "^#"`
### Issue 5: Missing Error Context
```python
# ❌ WRONG
raise ValueError("Invalid input")
# ✅ CORRECT
raise ValueError(f"Invalid model format: '{model_str}'. Expected: 'name:size:quant' (e.g., 'qwen:7b:q4')")
```
**Check:** All errors explain what was expected vs received.
## Review Workflow
1. **First Pass: Structure** (5 min)
- Check branch name, commits, no debug code
- If failed → Write report, BLOCK
2. **Second Pass: Quality** (10 min)
- Run tests, check types, review code
- If failed → Write report, CHANGES_REQUESTED
3. **Third Pass: Deep Dive** (15 min)
- Read logic, check edge cases
- Verify token counts
- Check architecture
- Write detailed report
4. **Final Decision** (5 min)
- APPROVE / CHANGES_REQUESTED / BLOCK
- Write report to `reports/` folder
- Post summary in PR comments
**Total time per review: 30-35 minutes**
## Reviewer Self-Check
Before submitting review:
- [ ] I ran all tests locally
- [ ] I checked type hints
- [ ] I counted tokens (if applicable)
- [ ] I read every line of changed code
- [ ] My feedback is specific and actionable
- [ ] I explained WHY for each blocker
- [ ] I wrote a report to `reports/` folder
## Escalation
Escalate to architecture discussion if:
- PR changes core patterns
- Token budget cannot be met
- Two reviewers disagree
- Breaking changes proposed
**Don't just approve to be nice.**
**Don't let technical debt accumulate.**
## Report Storage
All reports go in `reports/` folder:
```
reports/
├── PR-123-fix-tool-parsing.md
├── PR-124-add-federation.md
├── PR-125-refactor-consensus.md
└── README.md # Index of all reviews
```
**This folder is gitignored - reports stay local.**
Generate index with:
```bash
ls -1 reports/PR-*.md | sort -t'-' -k2 -n > reports/README.md
```
---
**Remember: You're the last line of defense against technical debt. Be thorough, be kind, be strict.**