local_swarm/docs/GUIDE.md

# Local Swarm - Complete Documentation

## Table of Contents

1. [Quick Start Guide](#quick-start-guide)
2. [Opencode Configuration](#opencode-configuration)
3. [API Reference](#api-reference)
4. [Troubleshooting](#troubleshooting)
5. [Advanced Configuration](#advanced-configuration)
6. [Performance Tuning](#performance-tuning)

---

## Quick Start Guide

### Installation

**Windows:**
```powershell
git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
.\scripts\install.bat
```

**macOS/Linux:**
```bash
git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
chmod +x scripts/install.sh
./scripts/install.sh
```

**Android (Termux):**
```bash
git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
chmod +x scripts/install-termux.sh
./scripts/install-termux.sh
```

### First Run

```bash
# Start with interactive menu
python main.py

# Or skip menu with auto-detection
python main.py --auto
```

---

## Opencode Configuration

### Basic Configuration

Add to your opencode configuration file (usually `~/.config/opencode/config.json`):

```json
{
  "model": {
    "provider": "openai",
    "base_url": "http://localhost:8000/v1",
    "api_key": "not-needed",
    "model": "local-swarm"
  }
}
```

### Configuration with Local Swarm on Different Machine

If Local Swarm is running on another computer in your network:

```json
{
  "model": {
    "provider": "openai",
    "base_url": "http://192.168.1.100:8000/v1",
    "api_key": "not-needed",
    "model": "local-swarm"
  }
}
```

### Multiple Model Options

You can configure multiple models and switch between them:

```json
{
  "models": {
    "local-swarm": {
      "provider": "openai",
      "base_url": "http://localhost:8000/v1",
      "api_key": "not-needed",
      "model": "local-swarm"
    },
    "local-swarm-fast": {
      "provider": "openai",
      "base_url": "http://localhost:8000/v1",
      "api_key": "not-needed",
      "model": "local-swarm",
      "temperature": 0.2
    }
  },
  "default_model": "local-swarm"
}
```

### With Context Window Configuration

```json
{
  "model": {
    "provider": "openai",
    "base_url": "http://localhost:8000/v1",
    "api_key": "not-needed",
    "model": "local-swarm",
    "max_tokens": 4096,
    "temperature": 0.7
  }
}
```

### Environment-Specific Configurations

**Development (local only):**
```json
{
  "model": {
    "provider": "openai",
    "base_url": "http://localhost:8000/v1",
    "api_key": "not-needed",
    "model": "local-swarm",
    "temperature": 0.8
  }
}
```

**Production (federated swarm):**
```json
{
  "model": {
    "provider": "openai",
    "base_url": "http://swarm-coordinator.local:8000/v1",
    "api_key": "not-needed",
    "model": "local-swarm",
    "temperature": 0.5
  }
}
```

### Testing the Configuration

After configuring opencode, test with:

```bash
# Simple test
opencode --version

# Test with a prompt
echo "Write a Python function to calculate factorial" | opencode
```

---

## API Reference

### OpenAI-Compatible Endpoints

Local Swarm implements the OpenAI API specification.

#### POST /v1/chat/completions

Generate a chat completion.

**Request:**
```json
{
  "model": "local-swarm",
  "messages": [
    {"role": "user", "content": "Write a Python function to calculate factorial"}
  ],
  "max_tokens": 2048,
  "temperature": 0.7,
  "stream": false
}
```

**Response:**
```json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "local-swarm",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "def factorial(n):\n    if n <= 1:\n        return 1\n    return n * factorial(n-1)"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 25,
    "total_tokens": 40
  }
}
```

#### GET /v1/models

List available models.

**Response:**
```json
{
  "object": "list",
  "data": [
    {
      "id": "local-swarm",
      "object": "model",
      "created": 1234567890,
      "owned_by": "local-swarm"
    }
  ]
}
```

#### GET /health

Check health status.

**Response:**
```json
{
  "status": "healthy",
  "version": "0.1.0",
  "workers": 5,
  "model": "Qwen 2.5 Coder 7b (q4_k_m)"
}
```

#### Federation Endpoints (when enabled)

**GET /v1/federation/status**
```json
{
  "enabled": true,
  "total_peers": 3,
  "healthy_peers": 3,
  "strategy": "weighted"
}
```

**GET /v1/federation/peers**
```json
{
  "peers": [
    {
      "name": "desktop-pc",
      "host": "192.168.1.100",
      "port": 8000,
      "model_id": "qwen2.5-coder:7b:q4_k_m",
      "instances": 3
    }
  ]
}
```

---

## Troubleshooting

### Common Issues

#### Issue: "No module named 'llama_cpp'"

**Solution:**
```bash
# Install with pre-built wheel (recommended)
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

# Or CPU-only
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
```

#### Issue: "CUDA not detected" on Windows

**Solution:**
1. Install NVIDIA drivers: https://www.nvidia.com/drivers
2. Verify with: `nvidia-smi`
3. Reinstall with CUDA support:
```powershell
pip uninstall llama-cpp-python
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
```

#### Issue: "Out of memory" errors

**Solution:**
```bash
# Reduce instances
python main.py --instances 2

# Or use smaller model
python main.py --model qwen2.5-coder:3b:q4
```

#### Issue: Slow performance on CPU

**Solution:**
- Use smaller models (3B instead of 7B)
- Use Q4 quantization instead of Q6
- Reduce number of instances to 2-3
- Close other applications

#### Issue: "No suitable model found"

**Solution:**
Your system has less than 2GB available memory. Try:
- Close other applications
- Use CPU-only mode (automatic if no GPU)
- Add more RAM or use a machine with GPU

#### Issue: Models not downloading

**Solution:**
```bash
# Check internet connection
ping huggingface.co

# Try manual download
python main.py --download-only

# Check cache directory
ls ~/.local_swarm/models
```

### Platform-Specific Issues

**Windows:**
- Ensure Python is in PATH
- Run PowerShell as Administrator if needed
- Install Visual C++ Redistributable

**macOS:**
- Xcode Command Line Tools: `xcode-select --install`
- May need to allow llama.cpp in Security preferences

**Linux:**
- Install build essentials: `sudo apt-get install build-essential`
- For AMD: Install ROCm drivers
- For Intel: Install oneAPI toolkit

---

## Advanced Configuration

### Configuration File (config.yaml)

Create `config.yaml` in the project root:

```yaml
server:
  host: "127.0.0.1"
  port: 8000

swarm:
  consensus_strategy: "similarity"  # similarity, quality, fastest
  min_instances: 2
  max_instances: 5

federation:
  enabled: false
  discovery_port: 8765
  federation_port: 8766
  max_peers: 10

hardware:
  gpu_memory_fraction: 1.0  # Use 100% of GPU VRAM
  ram_fraction: 0.5  # Use 50% of system RAM for CPU

models:
  cache_dir: "~/.local_swarm/models"
  preferred_models:
    - qwen2.5-coder
    - deepseek-coder
```

### Environment Variables

```bash
# Custom cache directory
export LOCAL_SWARM_CACHE_DIR="/path/to/models"

# Debug mode
export LOCAL_SWARM_DEBUG=1

# Custom config file
export LOCAL_SWARM_CONFIG="/path/to/config.yaml"
```

---

## Performance Tuning

### For Maximum Speed

```bash
# Use smaller model
python main.py --model qwen2.5-coder:3b:q4

# Reduce instances (less memory contention)
python main.py --instances 2

# Skip consensus (single worker)
# Edit config: consensus_strategy: "fastest"
```

### For Maximum Quality

```bash
# Use largest model that fits
python main.py --model qwen2.5-coder:7b:q6

# More instances for better consensus
python main.py --instances 5

# Use quality consensus strategy
# Edit config: consensus_strategy: "quality"
```

### For Balanced Performance

```bash
# Recommended defaults (automatic)
python main.py

# Or explicitly
python main.py --model qwen2.5-coder:7b:q4
```

### Memory Usage by Model

| Model Size | Q4 VRAM | Q5 VRAM | Q6 VRAM |
|------------|---------|---------|---------|
| 1B-3B      | 0.7-2GB | 0.9-2.5GB | 1.1-3GB |
| 7B         | 4.5GB   | 5.2GB   | 6.0GB   |
| 13B-15B    | 8-9GB   | 9.5-11GB | 11-13GB |

**Recommended:** Use Q4_K_M for best speed/quality balance.

---

## MCP Server Configuration

### Enable MCP Server

```bash
python main.py --mcp
```

### MCP Tools Available

When MCP is enabled, AI assistants can use:

- `get_hardware_info` - Query system capabilities
- `get_swarm_status` - Check swarm health
- `generate_code` - Generate with consensus
- `list_available_models` - Browse models
- `get_worker_details` - Worker statistics

### Testing MCP

```bash
# List available tools
mcp-cli call local-swarm list_tools

# Call a tool
mcp-cli call local-swarm call_tool get_swarm_status
```

---

## Network Federation

### Setup Federated Swarm

On each machine in your network:

```bash
# Machine 1 (Windows PC with RTX 4060)
python main.py --federation --port 8000

# Machine 2 (Mac Mini M1)
python main.py --federation --port 8000

# Machine 3 (Linux with AMD GPU)
python main.py --federation --port 8000
```

Machines will auto-discover each other via mDNS.

### Verify Federation

```bash
curl http://localhost:8000/v1/federation/status
curl http://localhost:8000/v1/federation/peers
```

---

## Getting Help

- **GitHub Issues:** https://github.com/sleepyeldrazi/local_swarm/issues
- **Interactive Help:** Run `python main.py` and select `[t] Tips & Help`
- **Hardware Detection:** Run `python main.py --detect`

## License

MIT License - See LICENSE file