Files
local_swarm/docs/GUIDE.md
T
sleepy 1788087145 Add comprehensive documentation
Create docs/GUIDE.md with complete documentation:
- Quick Start Guide for all platforms
- Opencode configuration examples:
  - Basic configuration
  - Remote machine setup
  - Multiple model options
  - Environment-specific configs
  - Testing instructions
- API Reference:
  - All OpenAI-compatible endpoints
  - Federation endpoints
  - Request/response examples
- Troubleshooting Guide:
  - Common issues and solutions
  - Platform-specific problems
  - Installation issues
- Advanced Configuration:
  - config.yaml options
  - Environment variables
- Performance Tuning:
  - Speed vs quality settings
  - Memory usage tables
  - Recommended configurations
- MCP Server setup and usage
- Network Federation guide

Update README.md:
- Add Documentation section with links
- Reference the complete guide

Documentation now covers:
 Installation all platforms
 Opencode integration
 API usage
 Troubleshooting
 Performance optimization
 Advanced features
2026-02-23 18:39:56 +01:00

9.5 KiB

Local Swarm - Complete Documentation

Table of Contents

  1. Quick Start Guide
  2. Opencode Configuration
  3. API Reference
  4. Troubleshooting
  5. Advanced Configuration
  6. Performance Tuning

Quick Start Guide

Installation

Windows:

git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
.\scripts\install.bat

macOS/Linux:

git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
chmod +x scripts/install.sh
./scripts/install.sh

Android (Termux):

git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
chmod +x scripts/install-termux.sh
./scripts/install-termux.sh

First Run

# Start with interactive menu
python main.py

# Or skip menu with auto-detection
python main.py --auto

Opencode Configuration

Basic Configuration

Add to your opencode configuration file (usually ~/.config/opencode/config.json):

{
  "model": {
    "provider": "openai",
    "base_url": "http://localhost:8000/v1",
    "api_key": "not-needed",
    "model": "local-swarm"
  }
}

Configuration with Local Swarm on Different Machine

If Local Swarm is running on another computer in your network:

{
  "model": {
    "provider": "openai",
    "base_url": "http://192.168.1.100:8000/v1",
    "api_key": "not-needed",
    "model": "local-swarm"
  }
}

Multiple Model Options

You can configure multiple models and switch between them:

{
  "models": {
    "local-swarm": {
      "provider": "openai",
      "base_url": "http://localhost:8000/v1",
      "api_key": "not-needed",
      "model": "local-swarm"
    },
    "local-swarm-fast": {
      "provider": "openai",
      "base_url": "http://localhost:8000/v1",
      "api_key": "not-needed",
      "model": "local-swarm",
      "temperature": 0.2
    }
  },
  "default_model": "local-swarm"
}

With Context Window Configuration

{
  "model": {
    "provider": "openai",
    "base_url": "http://localhost:8000/v1",
    "api_key": "not-needed",
    "model": "local-swarm",
    "max_tokens": 4096,
    "temperature": 0.7
  }
}

Environment-Specific Configurations

Development (local only):

{
  "model": {
    "provider": "openai",
    "base_url": "http://localhost:8000/v1",
    "api_key": "not-needed",
    "model": "local-swarm",
    "temperature": 0.8
  }
}

Production (federated swarm):

{
  "model": {
    "provider": "openai",
    "base_url": "http://swarm-coordinator.local:8000/v1",
    "api_key": "not-needed",
    "model": "local-swarm",
    "temperature": 0.5
  }
}

Testing the Configuration

After configuring opencode, test with:

# Simple test
opencode --version

# Test with a prompt
echo "Write a Python function to calculate factorial" | opencode

API Reference

OpenAI-Compatible Endpoints

Local Swarm implements the OpenAI API specification.

POST /v1/chat/completions

Generate a chat completion.

Request:

{
  "model": "local-swarm",
  "messages": [
    {"role": "user", "content": "Write a Python function to calculate factorial"}
  ],
  "max_tokens": 2048,
  "temperature": 0.7,
  "stream": false
}

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "local-swarm",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "def factorial(n):\n    if n <= 1:\n        return 1\n    return n * factorial(n-1)"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 25,
    "total_tokens": 40
  }
}

GET /v1/models

List available models.

Response:

{
  "object": "list",
  "data": [
    {
      "id": "local-swarm",
      "object": "model",
      "created": 1234567890,
      "owned_by": "local-swarm"
    }
  ]
}

GET /health

Check health status.

Response:

{
  "status": "healthy",
  "version": "0.1.0",
  "workers": 5,
  "model": "Qwen 2.5 Coder 7b (q4_k_m)"
}

Federation Endpoints (when enabled)

GET /v1/federation/status

{
  "enabled": true,
  "total_peers": 3,
  "healthy_peers": 3,
  "strategy": "weighted"
}

GET /v1/federation/peers

{
  "peers": [
    {
      "name": "desktop-pc",
      "host": "192.168.1.100",
      "port": 8000,
      "model_id": "qwen2.5-coder:7b:q4_k_m",
      "instances": 3
    }
  ]
}

Troubleshooting

Common Issues

Issue: "No module named 'llama_cpp'"

Solution:

# Install with pre-built wheel (recommended)
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

# Or CPU-only
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu

Issue: "CUDA not detected" on Windows

Solution:

  1. Install NVIDIA drivers: https://www.nvidia.com/drivers
  2. Verify with: nvidia-smi
  3. Reinstall with CUDA support:
pip uninstall llama-cpp-python
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

Issue: "Out of memory" errors

Solution:

# Reduce instances
python main.py --instances 2

# Or use smaller model
python main.py --model qwen2.5-coder:3b:q4

Issue: Slow performance on CPU

Solution:

  • Use smaller models (3B instead of 7B)
  • Use Q4 quantization instead of Q6
  • Reduce number of instances to 2-3
  • Close other applications

Issue: "No suitable model found"

Solution: Your system has less than 2GB available memory. Try:

  • Close other applications
  • Use CPU-only mode (automatic if no GPU)
  • Add more RAM or use a machine with GPU

Issue: Models not downloading

Solution:

# Check internet connection
ping huggingface.co

# Try manual download
python main.py --download-only

# Check cache directory
ls ~/.local_swarm/models

Platform-Specific Issues

Windows:

  • Ensure Python is in PATH
  • Run PowerShell as Administrator if needed
  • Install Visual C++ Redistributable

macOS:

  • Xcode Command Line Tools: xcode-select --install
  • May need to allow llama.cpp in Security preferences

Linux:

  • Install build essentials: sudo apt-get install build-essential
  • For AMD: Install ROCm drivers
  • For Intel: Install oneAPI toolkit

Advanced Configuration

Configuration File (config.yaml)

Create config.yaml in the project root:

server:
  host: "127.0.0.1"
  port: 8000

swarm:
  consensus_strategy: "similarity"  # similarity, quality, fastest
  min_instances: 2
  max_instances: 5

federation:
  enabled: false
  discovery_port: 8765
  federation_port: 8766
  max_peers: 10

hardware:
  gpu_memory_fraction: 1.0  # Use 100% of GPU VRAM
  ram_fraction: 0.5  # Use 50% of system RAM for CPU

models:
  cache_dir: "~/.local_swarm/models"
  preferred_models:
    - qwen2.5-coder
    - deepseek-coder

Environment Variables

# Custom cache directory
export LOCAL_SWARM_CACHE_DIR="/path/to/models"

# Debug mode
export LOCAL_SWARM_DEBUG=1

# Custom config file
export LOCAL_SWARM_CONFIG="/path/to/config.yaml"

Performance Tuning

For Maximum Speed

# Use smaller model
python main.py --model qwen2.5-coder:3b:q4

# Reduce instances (less memory contention)
python main.py --instances 2

# Skip consensus (single worker)
# Edit config: consensus_strategy: "fastest"

For Maximum Quality

# Use largest model that fits
python main.py --model qwen2.5-coder:7b:q6

# More instances for better consensus
python main.py --instances 5

# Use quality consensus strategy
# Edit config: consensus_strategy: "quality"

For Balanced Performance

# Recommended defaults (automatic)
python main.py

# Or explicitly
python main.py --model qwen2.5-coder:7b:q4

Memory Usage by Model

Model Size Q4 VRAM Q5 VRAM Q6 VRAM
1B-3B 0.7-2GB 0.9-2.5GB 1.1-3GB
7B 4.5GB 5.2GB 6.0GB
13B-15B 8-9GB 9.5-11GB 11-13GB

Recommended: Use Q4_K_M for best speed/quality balance.


MCP Server Configuration

Enable MCP Server

python main.py --mcp

MCP Tools Available

When MCP is enabled, AI assistants can use:

  • get_hardware_info - Query system capabilities
  • get_swarm_status - Check swarm health
  • generate_code - Generate with consensus
  • list_available_models - Browse models
  • get_worker_details - Worker statistics

Testing MCP

# List available tools
mcp-cli call local-swarm list_tools

# Call a tool
mcp-cli call local-swarm call_tool get_swarm_status

Network Federation

Setup Federated Swarm

On each machine in your network:

# Machine 1 (Windows PC with RTX 4060)
python main.py --federation --port 8000

# Machine 2 (Mac Mini M1)
python main.py --federation --port 8000

# Machine 3 (Linux with AMD GPU)
python main.py --federation --port 8000

Machines will auto-discover each other via mDNS.

Verify Federation

curl http://localhost:8000/v1/federation/status
curl http://localhost:8000/v1/federation/peers

Getting Help

License

MIT License - See LICENSE file