Files

T

sleepy 1788087145 Add comprehensive documentation

Create docs/GUIDE.md with complete documentation:
- Quick Start Guide for all platforms
- Opencode configuration examples:
  - Basic configuration
  - Remote machine setup
  - Multiple model options
  - Environment-specific configs
  - Testing instructions
- API Reference:
  - All OpenAI-compatible endpoints
  - Federation endpoints
  - Request/response examples
- Troubleshooting Guide:
  - Common issues and solutions
  - Platform-specific problems
  - Installation issues
- Advanced Configuration:
  - config.yaml options
  - Environment variables
- Performance Tuning:
  - Speed vs quality settings
  - Memory usage tables
  - Recommended configurations
- MCP Server setup and usage
- Network Federation guide

Update README.md:
- Add Documentation section with links
- Reference the complete guide

Documentation now covers:
✅ Installation all platforms
✅ Opencode integration
✅ API usage
✅ Troubleshooting
✅ Performance optimization
✅ Advanced features

2026-02-23 18:39:56 +01:00

9.5 KiB

Raw Blame History

Local Swarm - Complete Documentation

Quick Start Guide
Opencode Configuration
API Reference
Troubleshooting
Advanced Configuration
Performance Tuning

Quick Start Guide

Installation

Windows:

git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
.\scripts\install.bat

macOS/Linux:

git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
chmod +x scripts/install.sh
./scripts/install.sh

Android (Termux):

git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
chmod +x scripts/install-termux.sh
./scripts/install-termux.sh

First Run

# Start with interactive menu
python main.py

# Or skip menu with auto-detection
python main.py --auto

Opencode Configuration

Basic Configuration

Add to your opencode configuration file (usually ~/.config/opencode/config.json):

{
  "model": {
    "provider": "openai",
    "base_url": "http://localhost:8000/v1",
    "api_key": "not-needed",
    "model": "local-swarm"
  }
}

Configuration with Local Swarm on Different Machine

If Local Swarm is running on another computer in your network:

{
  "model": {
    "provider": "openai",
    "base_url": "http://192.168.1.100:8000/v1",
    "api_key": "not-needed",
    "model": "local-swarm"
  }
}

Multiple Model Options

You can configure multiple models and switch between them:

{
  "models": {
    "local-swarm": {
      "provider": "openai",
      "base_url": "http://localhost:8000/v1",
      "api_key": "not-needed",
      "model": "local-swarm"
    },
    "local-swarm-fast": {
      "provider": "openai",
      "base_url": "http://localhost:8000/v1",
      "api_key": "not-needed",
      "model": "local-swarm",
      "temperature": 0.2
    }
  },
  "default_model": "local-swarm"
}

With Context Window Configuration

{
  "model": {
    "provider": "openai",
    "base_url": "http://localhost:8000/v1",
    "api_key": "not-needed",
    "model": "local-swarm",
    "max_tokens": 4096,
    "temperature": 0.7
  }
}

Environment-Specific Configurations

Development (local only):

{
  "model": {
    "provider": "openai",
    "base_url": "http://localhost:8000/v1",
    "api_key": "not-needed",
    "model": "local-swarm",
    "temperature": 0.8
  }
}

Production (federated swarm):

{
  "model": {
    "provider": "openai",
    "base_url": "http://swarm-coordinator.local:8000/v1",
    "api_key": "not-needed",
    "model": "local-swarm",
    "temperature": 0.5
  }
}

Testing the Configuration

After configuring opencode, test with:

# Simple test
opencode --version

# Test with a prompt
echo "Write a Python function to calculate factorial" | opencode

API Reference

OpenAI-Compatible Endpoints

Local Swarm implements the OpenAI API specification.

POST /v1/chat/completions

Generate a chat completion.

Request:

{
  "model": "local-swarm",
  "messages": [
    {"role": "user", "content": "Write a Python function to calculate factorial"}
  ],
  "max_tokens": 2048,
  "temperature": 0.7,
  "stream": false
}

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "local-swarm",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "def factorial(n):\n    if n <= 1:\n        return 1\n    return n * factorial(n-1)"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 25,
    "total_tokens": 40
  }
}

GET /v1/models

List available models.

Response:

{
  "object": "list",
  "data": [
    {
      "id": "local-swarm",
      "object": "model",
      "created": 1234567890,
      "owned_by": "local-swarm"
    }
  ]
}

GET /health

Check health status.

Response:

{
  "status": "healthy",
  "version": "0.1.0",
  "workers": 5,
  "model": "Qwen 2.5 Coder 7b (q4_k_m)"
}

Federation Endpoints (when enabled)

GET /v1/federation/status

{
  "enabled": true,
  "total_peers": 3,
  "healthy_peers": 3,
  "strategy": "weighted"
}

GET /v1/federation/peers

{
  "peers": [
    {
      "name": "desktop-pc",
      "host": "192.168.1.100",
      "port": 8000,
      "model_id": "qwen2.5-coder:7b:q4_k_m",
      "instances": 3
    }
  ]
}

Troubleshooting

Common Issues

Issue: "No module named 'llama_cpp'"

Solution:

# Install with pre-built wheel (recommended)
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

# Or CPU-only
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu

Issue: "CUDA not detected" on Windows

Solution:

Install NVIDIA drivers: https://www.nvidia.com/drivers
Verify with: nvidia-smi
Reinstall with CUDA support:

pip uninstall llama-cpp-python
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

Issue: "Out of memory" errors

Solution:

# Reduce instances
python main.py --instances 2

# Or use smaller model
python main.py --model qwen2.5-coder:3b:q4

Issue: Slow performance on CPU

Solution:

Use smaller models (3B instead of 7B)
Use Q4 quantization instead of Q6
Reduce number of instances to 2-3
Close other applications

Issue: "No suitable model found"

Solution: Your system has less than 2GB available memory. Try:

Close other applications
Use CPU-only mode (automatic if no GPU)
Add more RAM or use a machine with GPU

Issue: Models not downloading

Solution:

# Check internet connection
ping huggingface.co

# Try manual download
python main.py --download-only

# Check cache directory
ls ~/.local_swarm/models

Platform-Specific Issues

Windows:

Ensure Python is in PATH
Run PowerShell as Administrator if needed
Install Visual C++ Redistributable

macOS:

Xcode Command Line Tools: xcode-select --install
May need to allow llama.cpp in Security preferences

Linux:

Install build essentials: sudo apt-get install build-essential
For AMD: Install ROCm drivers
For Intel: Install oneAPI toolkit

Advanced Configuration

Configuration File (config.yaml)

Create config.yaml in the project root:

server:
  host: "127.0.0.1"
  port: 8000

swarm:
  consensus_strategy: "similarity"  # similarity, quality, fastest
  min_instances: 2
  max_instances: 5

federation:
  enabled: false
  discovery_port: 8765
  federation_port: 8766
  max_peers: 10

hardware:
  gpu_memory_fraction: 1.0  # Use 100% of GPU VRAM
  ram_fraction: 0.5  # Use 50% of system RAM for CPU

models:
  cache_dir: "~/.local_swarm/models"
  preferred_models:
    - qwen2.5-coder
    - deepseek-coder

Environment Variables

# Custom cache directory
export LOCAL_SWARM_CACHE_DIR="/path/to/models"

# Debug mode
export LOCAL_SWARM_DEBUG=1

# Custom config file
export LOCAL_SWARM_CONFIG="/path/to/config.yaml"

Performance Tuning

For Maximum Speed

# Use smaller model
python main.py --model qwen2.5-coder:3b:q4

# Reduce instances (less memory contention)
python main.py --instances 2

# Skip consensus (single worker)
# Edit config: consensus_strategy: "fastest"

For Maximum Quality

# Use largest model that fits
python main.py --model qwen2.5-coder:7b:q6

# More instances for better consensus
python main.py --instances 5

# Use quality consensus strategy
# Edit config: consensus_strategy: "quality"

For Balanced Performance

# Recommended defaults (automatic)
python main.py

# Or explicitly
python main.py --model qwen2.5-coder:7b:q4

Memory Usage by Model

Model Size	Q4 VRAM	Q5 VRAM	Q6 VRAM
1B-3B	0.7-2GB	0.9-2.5GB	1.1-3GB
7B	4.5GB	5.2GB	6.0GB
13B-15B	8-9GB	9.5-11GB	11-13GB

Recommended: Use Q4_K_M for best speed/quality balance.

MCP Server Configuration

Enable MCP Server

python main.py --mcp

MCP Tools Available

When MCP is enabled, AI assistants can use:

get_hardware_info - Query system capabilities
get_swarm_status - Check swarm health
generate_code - Generate with consensus
list_available_models - Browse models
get_worker_details - Worker statistics

Testing MCP

# List available tools
mcp-cli call local-swarm list_tools

# Call a tool
mcp-cli call local-swarm call_tool get_swarm_status

Network Federation

Setup Federated Swarm

On each machine in your network:

# Machine 1 (Windows PC with RTX 4060)
python main.py --federation --port 8000

# Machine 2 (Mac Mini M1)
python main.py --federation --port 8000

# Machine 3 (Linux with AMD GPU)
python main.py --federation --port 8000

Machines will auto-discover each other via mDNS.

Verify Federation

curl http://localhost:8000/v1/federation/status
curl http://localhost:8000/v1/federation/peers

Getting Help

GitHub Issues: https://github.com/sleepyeldrazi/local_swarm/issues
Interactive Help: Run python main.py and select [t] Tips & Help
Hardware Detection: Run python main.py --detect

License

MIT License - See LICENSE file

9.5 KiB Raw Blame History

Local Swarm - Complete Documentation

Table of Contents

Quick Start Guide

Installation

First Run

Opencode Configuration

Basic Configuration

Configuration with Local Swarm on Different Machine

Multiple Model Options

With Context Window Configuration

Environment-Specific Configurations

Testing the Configuration

API Reference

OpenAI-Compatible Endpoints

POST /v1/chat/completions

GET /v1/models

GET /health

Federation Endpoints (when enabled)

Troubleshooting

Common Issues

Issue: "No module named 'llama_cpp'"

Issue: "CUDA not detected" on Windows

Issue: "Out of memory" errors

Issue: Slow performance on CPU

Issue: "No suitable model found"

Issue: Models not downloading

Platform-Specific Issues

Advanced Configuration

Configuration File (config.yaml)

Environment Variables

Performance Tuning

For Maximum Speed

For Maximum Quality

For Balanced Performance

Memory Usage by Model

MCP Server Configuration

Enable MCP Server

MCP Tools Available

Testing MCP

Network Federation

Setup Federated Swarm

Verify Federation

Getting Help

License

9.5 KiB

Raw Blame History