Files
local_swarm/docs/GUIDE.md
T
sleepy 1788087145 Add comprehensive documentation
Create docs/GUIDE.md with complete documentation:
- Quick Start Guide for all platforms
- Opencode configuration examples:
  - Basic configuration
  - Remote machine setup
  - Multiple model options
  - Environment-specific configs
  - Testing instructions
- API Reference:
  - All OpenAI-compatible endpoints
  - Federation endpoints
  - Request/response examples
- Troubleshooting Guide:
  - Common issues and solutions
  - Platform-specific problems
  - Installation issues
- Advanced Configuration:
  - config.yaml options
  - Environment variables
- Performance Tuning:
  - Speed vs quality settings
  - Memory usage tables
  - Recommended configurations
- MCP Server setup and usage
- Network Federation guide

Update README.md:
- Add Documentation section with links
- Reference the complete guide

Documentation now covers:
 Installation all platforms
 Opencode integration
 API usage
 Troubleshooting
 Performance optimization
 Advanced features
2026-02-23 18:39:56 +01:00

525 lines
9.5 KiB
Markdown

# Local Swarm - Complete Documentation
## Table of Contents
1. [Quick Start Guide](#quick-start-guide)
2. [Opencode Configuration](#opencode-configuration)
3. [API Reference](#api-reference)
4. [Troubleshooting](#troubleshooting)
5. [Advanced Configuration](#advanced-configuration)
6. [Performance Tuning](#performance-tuning)
---
## Quick Start Guide
### Installation
**Windows:**
```powershell
git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
.\scripts\install.bat
```
**macOS/Linux:**
```bash
git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
chmod +x scripts/install.sh
./scripts/install.sh
```
**Android (Termux):**
```bash
git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
chmod +x scripts/install-termux.sh
./scripts/install-termux.sh
```
### First Run
```bash
# Start with interactive menu
python main.py
# Or skip menu with auto-detection
python main.py --auto
```
---
## Opencode Configuration
### Basic Configuration
Add to your opencode configuration file (usually `~/.config/opencode/config.json`):
```json
{
"model": {
"provider": "openai",
"base_url": "http://localhost:8000/v1",
"api_key": "not-needed",
"model": "local-swarm"
}
}
```
### Configuration with Local Swarm on Different Machine
If Local Swarm is running on another computer in your network:
```json
{
"model": {
"provider": "openai",
"base_url": "http://192.168.1.100:8000/v1",
"api_key": "not-needed",
"model": "local-swarm"
}
}
```
### Multiple Model Options
You can configure multiple models and switch between them:
```json
{
"models": {
"local-swarm": {
"provider": "openai",
"base_url": "http://localhost:8000/v1",
"api_key": "not-needed",
"model": "local-swarm"
},
"local-swarm-fast": {
"provider": "openai",
"base_url": "http://localhost:8000/v1",
"api_key": "not-needed",
"model": "local-swarm",
"temperature": 0.2
}
},
"default_model": "local-swarm"
}
```
### With Context Window Configuration
```json
{
"model": {
"provider": "openai",
"base_url": "http://localhost:8000/v1",
"api_key": "not-needed",
"model": "local-swarm",
"max_tokens": 4096,
"temperature": 0.7
}
}
```
### Environment-Specific Configurations
**Development (local only):**
```json
{
"model": {
"provider": "openai",
"base_url": "http://localhost:8000/v1",
"api_key": "not-needed",
"model": "local-swarm",
"temperature": 0.8
}
}
```
**Production (federated swarm):**
```json
{
"model": {
"provider": "openai",
"base_url": "http://swarm-coordinator.local:8000/v1",
"api_key": "not-needed",
"model": "local-swarm",
"temperature": 0.5
}
}
```
### Testing the Configuration
After configuring opencode, test with:
```bash
# Simple test
opencode --version
# Test with a prompt
echo "Write a Python function to calculate factorial" | opencode
```
---
## API Reference
### OpenAI-Compatible Endpoints
Local Swarm implements the OpenAI API specification.
#### POST /v1/chat/completions
Generate a chat completion.
**Request:**
```json
{
"model": "local-swarm",
"messages": [
{"role": "user", "content": "Write a Python function to calculate factorial"}
],
"max_tokens": 2048,
"temperature": 0.7,
"stream": false
}
```
**Response:**
```json
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "local-swarm",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "def factorial(n):\n if n <= 1:\n return 1\n return n * factorial(n-1)"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 25,
"total_tokens": 40
}
}
```
#### GET /v1/models
List available models.
**Response:**
```json
{
"object": "list",
"data": [
{
"id": "local-swarm",
"object": "model",
"created": 1234567890,
"owned_by": "local-swarm"
}
]
}
```
#### GET /health
Check health status.
**Response:**
```json
{
"status": "healthy",
"version": "0.1.0",
"workers": 5,
"model": "Qwen 2.5 Coder 7b (q4_k_m)"
}
```
#### Federation Endpoints (when enabled)
**GET /v1/federation/status**
```json
{
"enabled": true,
"total_peers": 3,
"healthy_peers": 3,
"strategy": "weighted"
}
```
**GET /v1/federation/peers**
```json
{
"peers": [
{
"name": "desktop-pc",
"host": "192.168.1.100",
"port": 8000,
"model_id": "qwen2.5-coder:7b:q4_k_m",
"instances": 3
}
]
}
```
---
## Troubleshooting
### Common Issues
#### Issue: "No module named 'llama_cpp'"
**Solution:**
```bash
# Install with pre-built wheel (recommended)
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
# Or CPU-only
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
```
#### Issue: "CUDA not detected" on Windows
**Solution:**
1. Install NVIDIA drivers: https://www.nvidia.com/drivers
2. Verify with: `nvidia-smi`
3. Reinstall with CUDA support:
```powershell
pip uninstall llama-cpp-python
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
```
#### Issue: "Out of memory" errors
**Solution:**
```bash
# Reduce instances
python main.py --instances 2
# Or use smaller model
python main.py --model qwen2.5-coder:3b:q4
```
#### Issue: Slow performance on CPU
**Solution:**
- Use smaller models (3B instead of 7B)
- Use Q4 quantization instead of Q6
- Reduce number of instances to 2-3
- Close other applications
#### Issue: "No suitable model found"
**Solution:**
Your system has less than 2GB available memory. Try:
- Close other applications
- Use CPU-only mode (automatic if no GPU)
- Add more RAM or use a machine with GPU
#### Issue: Models not downloading
**Solution:**
```bash
# Check internet connection
ping huggingface.co
# Try manual download
python main.py --download-only
# Check cache directory
ls ~/.local_swarm/models
```
### Platform-Specific Issues
**Windows:**
- Ensure Python is in PATH
- Run PowerShell as Administrator if needed
- Install Visual C++ Redistributable
**macOS:**
- Xcode Command Line Tools: `xcode-select --install`
- May need to allow llama.cpp in Security preferences
**Linux:**
- Install build essentials: `sudo apt-get install build-essential`
- For AMD: Install ROCm drivers
- For Intel: Install oneAPI toolkit
---
## Advanced Configuration
### Configuration File (config.yaml)
Create `config.yaml` in the project root:
```yaml
server:
host: "127.0.0.1"
port: 8000
swarm:
consensus_strategy: "similarity" # similarity, quality, fastest
min_instances: 2
max_instances: 5
federation:
enabled: false
discovery_port: 8765
federation_port: 8766
max_peers: 10
hardware:
gpu_memory_fraction: 1.0 # Use 100% of GPU VRAM
ram_fraction: 0.5 # Use 50% of system RAM for CPU
models:
cache_dir: "~/.local_swarm/models"
preferred_models:
- qwen2.5-coder
- deepseek-coder
```
### Environment Variables
```bash
# Custom cache directory
export LOCAL_SWARM_CACHE_DIR="/path/to/models"
# Debug mode
export LOCAL_SWARM_DEBUG=1
# Custom config file
export LOCAL_SWARM_CONFIG="/path/to/config.yaml"
```
---
## Performance Tuning
### For Maximum Speed
```bash
# Use smaller model
python main.py --model qwen2.5-coder:3b:q4
# Reduce instances (less memory contention)
python main.py --instances 2
# Skip consensus (single worker)
# Edit config: consensus_strategy: "fastest"
```
### For Maximum Quality
```bash
# Use largest model that fits
python main.py --model qwen2.5-coder:7b:q6
# More instances for better consensus
python main.py --instances 5
# Use quality consensus strategy
# Edit config: consensus_strategy: "quality"
```
### For Balanced Performance
```bash
# Recommended defaults (automatic)
python main.py
# Or explicitly
python main.py --model qwen2.5-coder:7b:q4
```
### Memory Usage by Model
| Model Size | Q4 VRAM | Q5 VRAM | Q6 VRAM |
|------------|---------|---------|---------|
| 1B-3B | 0.7-2GB | 0.9-2.5GB | 1.1-3GB |
| 7B | 4.5GB | 5.2GB | 6.0GB |
| 13B-15B | 8-9GB | 9.5-11GB | 11-13GB |
**Recommended:** Use Q4_K_M for best speed/quality balance.
---
## MCP Server Configuration
### Enable MCP Server
```bash
python main.py --mcp
```
### MCP Tools Available
When MCP is enabled, AI assistants can use:
- `get_hardware_info` - Query system capabilities
- `get_swarm_status` - Check swarm health
- `generate_code` - Generate with consensus
- `list_available_models` - Browse models
- `get_worker_details` - Worker statistics
### Testing MCP
```bash
# List available tools
mcp-cli call local-swarm list_tools
# Call a tool
mcp-cli call local-swarm call_tool get_swarm_status
```
---
## Network Federation
### Setup Federated Swarm
On each machine in your network:
```bash
# Machine 1 (Windows PC with RTX 4060)
python main.py --federation --port 8000
# Machine 2 (Mac Mini M1)
python main.py --federation --port 8000
# Machine 3 (Linux with AMD GPU)
python main.py --federation --port 8000
```
Machines will auto-discover each other via mDNS.
### Verify Federation
```bash
curl http://localhost:8000/v1/federation/status
curl http://localhost:8000/v1/federation/peers
```
---
## Getting Help
- **GitHub Issues:** https://github.com/sleepyeldrazi/local_swarm/issues
- **Interactive Help:** Run `python main.py` and select `[t] Tips & Help`
- **Hardware Detection:** Run `python main.py --detect`
## License
MIT License - See LICENSE file