Files

T

sleepy 580d1e5d17 feat: comprehensive tool system improvements and webfetch support (#3 )

* feat: enhanced tool instructions for multi-step operations

- Add comprehensive examples for ls, find, grep, mkdir, npm init, etc.
- Explain multi-step workflow (explore → read → write)
- Tool system already supports chaining via conversation history
- Bash tool supports: ls, find, grep, cat, mkdir, cd, npm, etc.
- 30 second timeout on commands
- Output limited to 3000 chars for readability

* Cleanup: Consolidate documentation and tidy codebase

Documentation:
- Consolidate 6 markdown files into simplified README.md
- Remove redundant docs: TODO.md, NETWORK.md, REVIEW.md, PLAN.md, CONTEXT.md, GUIDE.md
- Add ARCHITECTURE.md with clean technical overview
- README now focuses on quick start and core concepts

Code verification:
- Verified blocking I/O properly wrapped in asyncio.to_thread()
- Confirmed locks initialized correctly in backends
- AMD VRAM detection uses proper regex (takes max value, not first match)
- All exception handling uses 'except Exception:' (not bare except)

Tool execution improvements (existing changes):
- Better working directory handling with project root detection
- Extended timeouts for package managers (300s)
- Multi-tool call parsing support
- Improved error handling and logging

Note: System prompt concern noted - 30k tokens too large for 16-32k context windows

* docs: add development patterns analysis

Document circular development issues identified in commit history:
- Tool execution went back-and-forth 3+ times (server-side vs client-side)
- Tool instructions changed from 40k → 300 → removed → enhanced tokens
- 8+ parsing fixes for same issues (no tests)
- 6 debug-only commits (production debugging)

Provides recommendations to prevent future cycles:
1. Pick one architecture and stick with it
2. Add unit tests before fixes
3. Token budget (<2000 for instructions)
4. One format only (remove alternative parsers)
5. Integration test script
6. Separate concerns into smaller modules
7. Design doc before code changes
8. CI/CD with automated testing

* docs: add comprehensive agent guidelines

AGENT_WORKER.md (600+ lines):
- Pre-flight checklist: token budget, test plan, design doc
- Coding rules: TDD, no debug code, architecture consistency
- Git workflow: branching strategy, commit rules, release process
- Testing requirements: unit (≥80%), integration structure
- Code quality: PEP 8, type hints, max 50 lines per function
- Architecture: no feature flags, separation of concerns
- Continuous learning: research requirements, documentation
- Forbidden patterns: bare except, production debugging, etc.

AGENT_REVIEW.md (400+ lines):
- Review philosophy: prevent circular development
- 6-phase review checklist: structure, quality, tokens, architecture, research, logic
- Report format with token impact analysis
- Severity levels: blocking vs warnings vs approved
- Common issues with examples (good vs bad)
- Review workflow: 30-35 min per PR
- Reports stored in reports/ folder (gitignored)

Also added:
- tests/test_tool_parsing.py - example test following guidelines
- Updated DEVELOPMENT_PATTERNS.md with recommendations

Reports folder in .gitignore for local review storage

* chore: gitignore review reports folder

* feat: fix tool execution and enhance instructions with accurate token counting

- Enhanced tool instructions (1041 tokens, within 2000 budget)
- Added tiktoken>=0.5.0 for accurate token counting
- Fixed subprocess hang by adding stdin=subprocess.DEVNULL
- Removed 9 DEBUG print statements from routes.py
- Added tests for instruction content and token budget verification
- All tests pass (11/11)

Resolves blockers from previous review:
- Token budget verified ✓
- Token documentation added ✓
- Debug code cleaned ✓
- Missing tests added ✓

* feat: implement comprehensive tool system with proper logging

Major improvements to tool instructions and execution:
- Enhanced tool instructions with 7-step task completion workflow
- Added markdown code block fallback parser for tool calls
- Fixed subprocess hang with stdin=subprocess.DEVNULL
- Fixed streaming path to return tool_calls (enabling multi-turn conversations)
- Added complete React project creation example with verification steps
- Token count: 1,743 tokens (within 2,000 limit)

Logging infrastructure:
- Created centralized logging configuration (src/utils/logging_config.py)
- Replaced 80+ print statements with logger.debug()
- Set log level to DEBUG for development
- All modules now use proper logging instead of print

Testing:
- Added 4 new tests for markdown parsing and instruction content
- All 13 tests passing
- Token budget verification test

Documentation:
- Added comprehensive design docs for all major changes
- Added test plans for verification
- Created helper scripts for logging migration

Files changed:
- main.py: Added logging setup
- src/api/routes.py: Tool instructions, streaming fixes, logging
- src/tools/executor.py: subprocess fix, logging
- src/utils/: New logging configuration module
- tests/test_tool_parsing.py: New tests
- docs/: Design decisions and test plans
- scripts/: Helper scripts for development

* refactor: simplify tool instructions to 109 tokens for 7B model

Reduced from 1,743 tokens to 109 tokens (94% reduction) to help
qwen2.5 7B 4bit model follow instructions better.

Changes:
- Removed complex workflow documentation
- Removed multi-turn conversation examples
- Removed lengthy anti-patterns
- Kept only essential format and rules
- Updated tests to match simplified content

Before: 1,743 tokens, 6,004 chars (87% of budget)
After: 109 tokens, 392 chars (5.5% of budget)

This should make it much easier for smaller models to:
1. Understand they must use tools
2. Follow the simple TOOL: format
3. Not get overwhelmed by instructions

* refactor: make tool instructions ultra-direct for 7B models

Further simplify instructions to prevent model from adding explanations.

Before: 109 tokens - model still added explanatory text
After: 86 tokens - ultra-direct commands

Key changes:
- Start with 'You MUST use tools. DO NOT explain.'
- 'OUTPUT THIS EXACT FORMAT - NOTHING ELSE'
- Removed all examples and pleasantries
- Added 'NEVER' rules in all caps
- 'ONLY output TOOL: lines'

The model was outputting:
'1. First, install... TOOL: bash ARGUMENTS: {...}'

Now should output just:
'TOOL: bash
ARGUMENTS: {...}'

This should force the 7B qwen model to stop explaining and just execute.

* refactor: move tool instructions to external config file

Moves hardcoded tool instructions from routes.py to external config file
for better maintainability and easier editing.

Changes:
- Created config/prompts/tool_instructions.txt
- Added _load_tool_instructions() function with caching
- Falls back to default if config file not found
- Updated tests to use the loader function
- Added proper error handling

Benefits:
- Easier to modify instructions without code changes
- Instructions can be edited by non-developers
- Cleaner separation of config vs code
- Supports hot-reloading (cached but easy to invalidate)

Token count: 86 tokens (loaded from file)
Location: config/prompts/tool_instructions.txt

* refactor: simplify tool instructions further and add debug logging

- Reduced instructions to bare minimum: 50 tokens
- Added debug logging to verify instructions are sent
- Removed all caps and aggressive language
- Made instructions more straightforward

Instructions now:
'Use tools to execute commands. Output only tool calls.
Format: TOOL: bash ARGUMENTS: {...}
No explanations. No numbered lists. No markdown. Only tool calls.'

This should be easier for 7B models to follow while still
conveying the essential requirements.

* feat: improve tool parser to handle 7B model output variations

Enhanced parse_tool_calls() with multiple fallback strategies:

1. Standard TOOL:/ARGUMENTS: format (original)
2. Markdown code blocks ()
3. Numbered list items (1. npm install ...)
4. Standalone bash commands (npm, npx, mkdir, etc.)

Now handles messy output from small models like:
'1. Install: npm install -g create-react-app'
'2. Create: create-react-app hello-world'

Parses these into chained bash commands for execution.

Also simplified instructions to 50 tokens minimum:
'Use tools to execute commands. Output only tool calls.
Format: TOOL: bash ARGUMENTS: {...}
No explanations. No numbered lists. No markdown. Only tool calls.'

This combination should make 7B models much more likely to
have their output successfully parsed and executed.

* fix: improve command extraction for 7B model output

Parser now extracts bash commands from any line containing:
- npm, npx, mkdir, cd, ls, cat, echo, git, python, pip, node, yarn
- create-react-app (added for React projects)

Example: Extracts 'npm install -g create-react-app' from:
'1. Install: npm install -g create-react-app'

Chains multiple commands with && for sequential execution.

This should now successfully parse the numbered list output
from 7B models and execute the commands.

* feat: add bash tool description validation and improve 7B model parsing

Changes:
- Added _ensure_tool_arguments() function to inject 'description' field
- Updated tool_instructions.txt to require description for bash tool
- Improved 7B model command extraction with better regex patterns
- Added 'create-react-app' to command detection list
- Updated delta field type to Dict[str, Any] for streaming
- Added GGUF to MLX quantization mapping for registry.py
- Clarified agent responsibilities in AGENT_REVIEW.md and AGENT_WORKER.md

Fixes:
- Bash tool now validates required 'description' field
- 7B model output parsed more reliably (numbered lists)
- Multiple commands chained with && for sequential execution

Token count: 69 tokens (down from 86, -19.8%)

All tests pass: 13/13

* feat: add webfetch tool support with URL extraction

Changes:
- Added webfetch to tool instructions config
- Added URL extraction pattern to parse_tool_calls()
- Parser now recognizes URLs and creates webfetch tool calls
- Updated token count: 89 tokens (+29% from 69)

The webfetch tool is available through opencode environment.
System prompt adjustment enables model to use it for URL fetching.

Token budget: 89 tokens (4.45% of 2000 limit)
Tests pass: 13/13

2026-02-24 22:35:05 +01:00

20 KiB

Raw Permalink Blame History

Agent Worker Rules

⚠️ IMPORTANT: This document is for IMPLEMENTATION AGENTS (coding, testing, documentation). DO NOT MAKE COMMITS - that's the AGENT_REVIEW.md agent's job.

Pre-Flight Checklist (MUST complete before coding)

⚠️ GIT OPERATIONS REMINDER

DO NOT make commits. Commits are ONLY handled by AGENT_REVIEW.md agents. You CAN create branches and stage files (git add), but DO NOT commit (git commit).

1. Token Budget Verification

System prompt + instructions ≤ 2000 tokens (hard limit)
Leave ≥ 50% of context window for user input
If adding documentation/examples, remove old ones to maintain budget
Use tiktoken or estimate: ~4 chars = 1 token

2. Test Plan Required

Before writing ANY code, write a test plan:

## Test Plan for [Feature]

### Unit Tests
- [ ] Test case 1: [specific input] → [expected output]
- [ ] Test case 2: [edge case]
- [ ] Test case 3: [error condition]

### Integration Tests  
- [ ] End-to-end flow: [steps]
- [ ] Expected result: [what success looks like]

### Manual Verification
- [ ] Command to run: [exact command]
- [ ] Expected output: [what to see]

3. Design Decision Document

For any change > 50 lines:

## Design Decision

### Problem
[What are we solving?]

### Options Considered
1. [Option A] - Pros: ..., Cons: ...
2. [Option B] - Pros: ..., Cons: ...

### Decision
[Which option and WHY]

### Impact
- Token count change: [+/- X tokens]
- Breaking changes: [Yes/No]
- Migration needed: [Yes/No]

Coding Rules

Rule 1: One Feature = One Commit

NOTE: Regular agents DO NOT make commits. AGENT_REVIEW.md agents handle commits.

When AGENT_REVIEW.md agents make commits:

Never combine unrelated changes in one commit
If you fix a bug AND refactor, make 2 commits
Commit message format: type(scope): description
- Types: feat, fix, refactor, test, docs, chore
- Example: feat(tools): add working directory support

Rule 2: Tests First (TDD)

# BAD: Write code, maybe test later
def parse_tools(text):
    # ... implementation ...
    pass

# GOOD: Write test first
def test_parse_simple_tool():
    text = 'TOOL: read\nARGUMENTS: {"filePath": "test.txt"}'
    content, tools = parse_tool_calls(text)
    assert len(tools) == 1
    assert tools[0]["function"]["name"] == "read"

# Then write minimal code to pass

Rule 3: No Production Debugging

NEVER add print() statements for debugging
Use logging module with appropriate levels
Remove ALL debug logging before committing
Exception: Structured logging for observability (metrics, errors)

# BAD
def process_request(request):
    print(f"DEBUG: Got request {request}")  # REMOVE THIS
    result = handle(request)
    print(f"DEBUG: Result {result}")  # REMOVE THIS
    return result

# GOOD
def process_request(request):
    logger.debug("Processing request", extra={"request_id": request.id})
    result = handle(request)
    return result

Rule 4: Architecture Consistency

Check ARCHITECTURE.md before changing patterns
If unsure, ask in PR description
NEVER change architecture in a "fix" commit
Architecture changes require design doc + team review

Rule 5: Parse Once, Parse Well

ONE parser per format
If adding new format, remove old one
Parser must handle all documented cases
Parser must fail gracefully (return empty, not crash)

# BAD: Multiple parsers for same thing
def parse_tools_v1(text): ...
def parse_tools_v2(text): ...
def parse_tools_legacy(text): ...

# GOOD: Single parser with clear regex
TOOL_PATTERN = r'TOOL:\s*(\w+)\s*\nARGUMENTS:\s*(\{[^}]*\})'

def parse_tool_calls(text: str) -> Tuple[str, List[dict]]:
    matches = list(re.finditer(TOOL_PATTERN, text, re.IGNORECASE))
    if not matches:
        return text, []
    # ... rest of parsing ...

Rule 6: Token-Aware Documentation

Every docstring/example has a token cost
Count tokens before adding
If over budget, remove something else
Prioritize: Code clarity > Examples > Explanations

# BAD: 150 tokens of fluff
def calculate(x, y):
    """
    This function calculates the sum of two numbers.
    
    The sum is calculated by using the built-in Python 
    addition operator which adds the values together.
    
    Args:
        x (int): The first number to add
        y (int): The second number to add
        
    Returns:
        int: The sum of x and y
        
    Example:
        >>> calculate(1, 2)
        3
    """
    return x + y

# GOOD: 20 tokens, clear enough
def calculate(x: int, y: int) -> int:
    """Return sum of x and y."""
    return x + y

Rule 7: Clear Error Messages

Every error must tell user EXACTLY what went wrong
Include context: what was expected vs what was received
Suggest fix if possible

# BAD
raise ValueError("Invalid input")

# GOOD
raise ValueError(f"Invalid model format: '{model_str}'. Expected: 'name:size:quant' (e.g., 'qwen:7b:q4')")

Rule 8: No Circular Imports

# BAD: src/a.py imports src/b.py, src/b.py imports src/a.py

# GOOD: Use dependency injection or move shared code to common module

Git Workflow Rules

CRITICAL: Commit Handling

REGULAR AGENTS: DO NOT MAKE COMMITS

Regular agents do NOT create commits, pull requests, or manage git history
Commits are ONLY handled by agents following AGENT_REVIEW.md guidelines
If you need to commit code, the AGENT_REVIEW.md agent should handle it
Exception: You may manually stage files (git add) for the review agent
You CAN create and checkout branches (that's fine) - just don't commit to them

Branch Strategy

Main Branches (Protected):

main - Production-ready code only
develop - Integration branch for features (optional for small projects)

Working Branches (Temporary - AGENT_REVIEW.md ONLY):

feature/description           # New features
fix/description               # Bug fixes  
refactor/description          # Code refactoring
hotfix/description            # Critical production fixes
docs/description              # Documentation only
experiment/description        # Experimental work (may be deleted)

Note: Regular agents should NOT create branches or handle git operations

Workflow Steps

1. Starting New Work

# ALWAYS start from main
git checkout main
git pull origin main

# Create feature branch
git checkout -b feature/description

# Push branch to remote immediately
git push -u origin feature/description

2. During Development

# Commit often (small, logical commits)
git add -p  # Stage interactively (review each change)
git commit -m "feat(scope): description"

# Push regularly (backup)
git push origin feature/description

# Keep up-to-date with main
git fetch origin
git rebase origin/main  # Resolve conflicts immediately

3. Before PR (Final Cleanup)

# Interactive rebase to clean history
git rebase -i main

# Squash these:
# - "fix typo"
# - "WIP"
# - "asdf"
# - "omg finally"
# - Multiple attempts at same fix

# Keep separate:
# - Logical feature steps
# - Refactoring separate from features
# - Test additions separate from code changes

4. Creating PR

Push final branch: git push origin feature/description
Create PR to main (not develop unless project uses git-flow)
Fill PR template completely
Request review from AGENT_REVIEW.md qualified reviewer
Link related issues: Closes #123, Fixes #456

Commit Rules

Commit Frequency:

Commit after each logical step (not just at end of day)
Each commit should leave codebase in working state
"Work in progress" commits OK on feature branches (clean before PR)

Commit Size:

Max 200 lines changed per commit
Max 5 files changed per commit (unless related)
Each commit reviewable in 5 minutes

Split large changes:

# BAD: One giant commit
git commit -am "Add federation + fix bugs + refactor + docs"

# GOOD: Separate commits
git commit -m "refactor(network): extract peer discovery logic"
git commit -m "feat(federation): implement cross-swarm voting"
git commit -m "fix(federation): handle peer timeout edge case"
git commit -m "docs: update federation architecture docs"

Commit Message Format:

type(scope): subject (50 chars or less)

Body (wrap at 72 chars):
- Why this change was made
- What problem it solves  
- Any breaking changes or migration notes

Refs: #123, #456

Types:

feat: New feature
fix: Bug fix
refactor: Code restructuring (no behavior change)
test: Adding/updating tests
docs: Documentation only
chore: Build, dependencies, tooling
perf: Performance improvement
style: Formatting (no code change)

Subject Rules:

Use imperative mood: "Add feature" not "Added feature"
No period at end
Lowercase after type
Max 50 characters

Branch Hygiene

DO:

Create branch from latest main
Use descriptive branch names
Push branch to remote immediately
Rebase onto main regularly
Delete merged branches
Squash fixup commits before PR

DON'T:

Commit directly to main
Have long-lived branches (>1 week without rebase)
Include unrelated changes in one branch
Commit broken code (even temporarily)
Force push to shared branches
Merge without review

Handling Conflicts

# While rebasing
git rebase main
# Conflicts happen...

# Resolve conflicts in files
git add <resolved-files>
git rebase --continue

# If messed up, abort
git rebase --abort

Conflict Resolution Rules:

Understand both changes before resolving
Don't just pick "ours" or "theirs"
Test after resolving
Commit message should explain resolution

Emergency Procedures

Committed to wrong branch:

# Undo last commit (keep changes)
git reset HEAD~1

# Stash changes
git stash

# Switch to correct branch
git checkout correct-branch

# Apply changes
git stash pop

# Commit properly
git commit -m "..."

Need to undo pushed commit:

# Revert (creates new commit, safe for shared history)
git revert <commit-hash>
git push origin branch-name

# OR if feature branch not shared yet
# Reset and force push (DANGEROUS)
git reset --hard HEAD~1
git push --force-with-lease origin branch-name

Release Process

NOTE: Release process should be handled by AGENT_REVIEW.md agents.

# Create release branch
git checkout -b release/v1.2.0

# Bump version, update changelog
git commit -m "chore: bump version to 1.2.0"

# Tag release
git tag -a v1.2.0 -m "Release version 1.2.0"
git push origin v1.2.0

# Merge to main
git checkout main
git merge --no-ff release/v1.2.0
git push origin main

# Delete release branch
git branch -d release/v1.2.0

What Regular Agents Should NOT Do

REGULAR AGENTS DO NOT:

Make commits (git commit)
Create pull requests
Push to remote repositories
Merge branches
Manage git history (rebase, reset, etc.)
Delete branches

REGULAR AGENTS CAN:

Create and checkout branches (git checkout -b)
Stage files for review (git add)
Switch between branches

REGULAR AGENTS SHOULD:

Write code and tests
Run tests locally
Use logging instead of print()
Follow code quality standards
Document changes in code comments or design docs
Hand off completed work to AGENT_REVIEW.md agent for commit/PR creation

Example Workflow:

1. Agent reads task from user
2. Agent creates feature branch (git checkout -b feature/name)
3. Agent implements feature (writes code, tests, docs)
4. Agent stages changes for review (git add)
5. Agent reports completion with summary of changes
6. AGENT_REVIEW.md agent:
   - Reviews code quality
   - Makes commits
   - Creates PR

Pre-Commit Checklist

Code passes pytest (if tests exist)
No print() statements (use logging)
No bare except: clauses
All functions have type hints
All public functions have docstrings
No TODO comments (create issues instead)
Token count checked (if modifying prompts)

Testing Requirements

Unit Test Coverage

Minimum 80% coverage for:

Parsing functions
Business logic
State machines

Integration Tests Required For:

API endpoints
Tool execution
File operations
Network calls (mocked)

Test File Structure

tests/
├── unit/
│   ├── test_parser.py
│   ├── test_executor.py
│   └── test_consensus.py
├── integration/
│   ├── test_api.py
│   └── test_tools.py
└── fixtures/
    └── sample_responses.json

Code Quality Standards

Python Style

Follow PEP 8
Use type hints for all function signatures
Max line length: 100 characters
Max function length: 50 lines
Max file length: 300 lines (split if larger)

Imports (Order Matters)

# 1. Standard library
import os
import sys
from typing import List

# 2. Third party
import numpy as np
from fastapi import APIRouter

# 3. Local (absolute imports only)
from src.tools.executor import ToolExecutor
from src.swarm.manager import SwarmManager

Documentation Standards

Every module must have:

"""Module purpose in one line.

Longer description if needed (2-3 sentences max).
"""

Every public function must have:

def process_data(data: dict, options: Optional[dict] = None) -> Result:
    """Process data with given options.
    
    Args:
        data: Input data to process
        options: Processing options (default: None)
        
    Returns:
        Processed result
        
    Raises:
        ValueError: If data is invalid
    """

Architecture Rules

No Feature Flags in Core Logic

# BAD
if config.get("USE_NEW_PARSER", False):
    result = new_parser(text)
else:
    result = old_parser(text)

# GOOD: Pick one, remove the other
def parse_tool_calls(text: str) -> Tuple[str, List[dict]]:
    """Parse tool calls from text."""
    # Single implementation

No Code Duplication

If you copy-paste > 3 lines, extract to function
Shared code goes in src/common/ or src/utils/

Separation of Concerns

src/
├── parser/       # Only parsing logic
├── executor/     # Only execution logic
├── formatter/    # Only formatting/output
└── integration/  # Only API glue code

Forbidden Patterns

Never Do These:

Bare except clauses - Always catch specific exceptions
Production debugging - No print(), use logging
Multiple return formats - One function = one return type
Silent failures - Always log/report errors
Magic numbers - Use named constants
Global state - Use dependency injection
Deep nesting - Max 3 levels of indentation
Circular dependencies - Re-architect if needed

Review Preparation

Before marking PR ready:

Self-Review Checklist (check each item):
- Tests pass: pytest -v
- Type checking: mypy src/
- Linting: ruff check src/
- Formatting: black src/
- Token count verified (if applicable)
- No debug code left in
- Commit messages follow format
- Documentation updated

PR Description Template:

## Changes
- [Brief description]

## Testing
- [How you tested it]

## Token Impact (if applicable)
- Before: X tokens
- After: Y tokens
- Change: +/- Z tokens

## Checklist
- [ ] Tests added/updated
- [ ] Documentation updated
- [ ] Self-review completed

Run Final Verification:

# Run all checks
pytest && mypy src/ && ruff check src/ && black --check src/

Continuous Learning & Research

You MUST periodically research best practices and alternative implementations. This prevents stagnation and ensures we're using proven approaches.

When to Research

Before Major Features:

Spend 15-30 minutes researching similar implementations
Check: GitHub, Stack Overflow, official docs, research papers
Document findings in PR description

Monthly Reviews:

Review project's core technologies for updates
Check if better libraries/algorithms exist
Look for deprecated patterns we're using

When Stuck:

Don't brute force a solution
Research how others solved similar problems
Consider if problem indicates architectural issue

What to Research

1. Best Practices

# Search queries to use:
"python async best practices 2024"
"fastapi error handling patterns"
"LLM consensus voting algorithms"
"gguf quantization comparison"

2. Similar Implementations

Search GitHub for similar projects
Read their architecture decisions
Check their issues for pitfalls they hit
Note: Don't copy code blindly, understand WHY

3. Research Papers & Benchmarks

For consensus algorithms
For quantization strategies
For context window optimization
For distributed systems patterns

4. Library Updates

Check CHANGELOG of major dependencies
Review migration guides
Test new features in separate branch

Documentation of Research

Create research/YYYY-MM-DD-topic.md for significant findings:

# Research: [Topic]

**Date:** YYYY-MM-DD
**Researcher:** [Name]
**Trigger:** [Why researched this]

## Findings

### Option 1: [Name]
- Source: [Link]
- Pros: ...
- Cons: ...
- Complexity: Low/Medium/High

### Option 2: [Name]
- Source: [Link]
- Pros: ...
- Cons: ...
- Complexity: Low/Medium/High

## Recommendation
[Which option and WHY]

## Implementation Notes
[Specific code changes needed]

## Risks
[What could go wrong]

Research Checklist

Before implementing:

Searched for similar open-source implementations
Checked recent best practices (2023+)
Looked for benchmarking data if applicable
Reviewed alternative approaches
Considered long-term maintenance implications

After implementing:

Documented why chosen approach was selected
Added comments linking to research sources
Created test comparing against alternatives (if applicable)

Example Research Topics

Immediate:

"Python type hints best practices 2024"
"FastAPI dependency injection patterns"
"LLM tool use format comparison"

Short-term:

"Consensus algorithms for distributed LLM systems"
"Context window compression techniques"
"GGUF quantization vs other formats"

Long-term:

"Speculative decoding implementation"
"PagedAttention for multiple workers"
"RAG integration patterns"

Research Sources

Reliable:

Official documentation (Python, FastAPI, etc.)
Well-maintained GitHub repos (>1k stars, active)
Recent conference talks (PyCon, NeurIPS, etc.)
Research papers with code (Papers With Code)
Official blogs (Python.org, FastAPI.tiangolo.com)

Use with Caution:

Medium articles (variable quality)
Old Stack Overflow answers (>2 years)
Tutorial sites (often outdated)
YouTube videos (hard to verify)

Integration with Development

Weekly:

Spend 30 minutes reading about one technology we use
Note any improvements we could make
Create issues for promising findings

Monthly:

Review all open research issues
Prioritize based on impact vs effort
Schedule implementation of high-value items

Quarterly:

Architecture review: Are our patterns still best?
Dependency audit: Updates needed?
Performance review: Could we be faster?

Remember:

Research prevents reinvention of the wheel
But don't research forever - timebox it (30 min max for most decisions)
Document findings so others don't repeat the research
Apply critical thinking - "best practice" depends on context

Breaking This Ruleset

If you MUST break a rule:

Document WHY in code comments
Get explicit approval in PR
Create follow-up issue to fix properly
Never break Rule 3 (No Production Debugging)

Remember: Quality over speed. A fix that takes 2 days with tests is better than a fix that takes 2 hours and breaks 3 other things.

20 KiB Raw Permalink Blame History