1788087145
Create docs/GUIDE.md with complete documentation: - Quick Start Guide for all platforms - Opencode configuration examples: - Basic configuration - Remote machine setup - Multiple model options - Environment-specific configs - Testing instructions - API Reference: - All OpenAI-compatible endpoints - Federation endpoints - Request/response examples - Troubleshooting Guide: - Common issues and solutions - Platform-specific problems - Installation issues - Advanced Configuration: - config.yaml options - Environment variables - Performance Tuning: - Speed vs quality settings - Memory usage tables - Recommended configurations - MCP Server setup and usage - Network Federation guide Update README.md: - Add Documentation section with links - Reference the complete guide Documentation now covers: ✅ Installation all platforms ✅ Opencode integration ✅ API usage ✅ Troubleshooting ✅ Performance optimization ✅ Advanced features
525 lines
9.5 KiB
Markdown
525 lines
9.5 KiB
Markdown
# Local Swarm - Complete Documentation
|
|
|
|
## Table of Contents
|
|
|
|
1. [Quick Start Guide](#quick-start-guide)
|
|
2. [Opencode Configuration](#opencode-configuration)
|
|
3. [API Reference](#api-reference)
|
|
4. [Troubleshooting](#troubleshooting)
|
|
5. [Advanced Configuration](#advanced-configuration)
|
|
6. [Performance Tuning](#performance-tuning)
|
|
|
|
---
|
|
|
|
## Quick Start Guide
|
|
|
|
### Installation
|
|
|
|
**Windows:**
|
|
```powershell
|
|
git clone https://github.com/yourusername/local_swarm.git
|
|
cd local_swarm
|
|
.\scripts\install.bat
|
|
```
|
|
|
|
**macOS/Linux:**
|
|
```bash
|
|
git clone https://github.com/yourusername/local_swarm.git
|
|
cd local_swarm
|
|
chmod +x scripts/install.sh
|
|
./scripts/install.sh
|
|
```
|
|
|
|
**Android (Termux):**
|
|
```bash
|
|
git clone https://github.com/yourusername/local_swarm.git
|
|
cd local_swarm
|
|
chmod +x scripts/install-termux.sh
|
|
./scripts/install-termux.sh
|
|
```
|
|
|
|
### First Run
|
|
|
|
```bash
|
|
# Start with interactive menu
|
|
python main.py
|
|
|
|
# Or skip menu with auto-detection
|
|
python main.py --auto
|
|
```
|
|
|
|
---
|
|
|
|
## Opencode Configuration
|
|
|
|
### Basic Configuration
|
|
|
|
Add to your opencode configuration file (usually `~/.config/opencode/config.json`):
|
|
|
|
```json
|
|
{
|
|
"model": {
|
|
"provider": "openai",
|
|
"base_url": "http://localhost:8000/v1",
|
|
"api_key": "not-needed",
|
|
"model": "local-swarm"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Configuration with Local Swarm on Different Machine
|
|
|
|
If Local Swarm is running on another computer in your network:
|
|
|
|
```json
|
|
{
|
|
"model": {
|
|
"provider": "openai",
|
|
"base_url": "http://192.168.1.100:8000/v1",
|
|
"api_key": "not-needed",
|
|
"model": "local-swarm"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Multiple Model Options
|
|
|
|
You can configure multiple models and switch between them:
|
|
|
|
```json
|
|
{
|
|
"models": {
|
|
"local-swarm": {
|
|
"provider": "openai",
|
|
"base_url": "http://localhost:8000/v1",
|
|
"api_key": "not-needed",
|
|
"model": "local-swarm"
|
|
},
|
|
"local-swarm-fast": {
|
|
"provider": "openai",
|
|
"base_url": "http://localhost:8000/v1",
|
|
"api_key": "not-needed",
|
|
"model": "local-swarm",
|
|
"temperature": 0.2
|
|
}
|
|
},
|
|
"default_model": "local-swarm"
|
|
}
|
|
```
|
|
|
|
### With Context Window Configuration
|
|
|
|
```json
|
|
{
|
|
"model": {
|
|
"provider": "openai",
|
|
"base_url": "http://localhost:8000/v1",
|
|
"api_key": "not-needed",
|
|
"model": "local-swarm",
|
|
"max_tokens": 4096,
|
|
"temperature": 0.7
|
|
}
|
|
}
|
|
```
|
|
|
|
### Environment-Specific Configurations
|
|
|
|
**Development (local only):**
|
|
```json
|
|
{
|
|
"model": {
|
|
"provider": "openai",
|
|
"base_url": "http://localhost:8000/v1",
|
|
"api_key": "not-needed",
|
|
"model": "local-swarm",
|
|
"temperature": 0.8
|
|
}
|
|
}
|
|
```
|
|
|
|
**Production (federated swarm):**
|
|
```json
|
|
{
|
|
"model": {
|
|
"provider": "openai",
|
|
"base_url": "http://swarm-coordinator.local:8000/v1",
|
|
"api_key": "not-needed",
|
|
"model": "local-swarm",
|
|
"temperature": 0.5
|
|
}
|
|
}
|
|
```
|
|
|
|
### Testing the Configuration
|
|
|
|
After configuring opencode, test with:
|
|
|
|
```bash
|
|
# Simple test
|
|
opencode --version
|
|
|
|
# Test with a prompt
|
|
echo "Write a Python function to calculate factorial" | opencode
|
|
```
|
|
|
|
---
|
|
|
|
## API Reference
|
|
|
|
### OpenAI-Compatible Endpoints
|
|
|
|
Local Swarm implements the OpenAI API specification.
|
|
|
|
#### POST /v1/chat/completions
|
|
|
|
Generate a chat completion.
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"model": "local-swarm",
|
|
"messages": [
|
|
{"role": "user", "content": "Write a Python function to calculate factorial"}
|
|
],
|
|
"max_tokens": 2048,
|
|
"temperature": 0.7,
|
|
"stream": false
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"id": "chatcmpl-abc123",
|
|
"object": "chat.completion",
|
|
"created": 1234567890,
|
|
"model": "local-swarm",
|
|
"choices": [{
|
|
"index": 0,
|
|
"message": {
|
|
"role": "assistant",
|
|
"content": "def factorial(n):\n if n <= 1:\n return 1\n return n * factorial(n-1)"
|
|
},
|
|
"finish_reason": "stop"
|
|
}],
|
|
"usage": {
|
|
"prompt_tokens": 15,
|
|
"completion_tokens": 25,
|
|
"total_tokens": 40
|
|
}
|
|
}
|
|
```
|
|
|
|
#### GET /v1/models
|
|
|
|
List available models.
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"object": "list",
|
|
"data": [
|
|
{
|
|
"id": "local-swarm",
|
|
"object": "model",
|
|
"created": 1234567890,
|
|
"owned_by": "local-swarm"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### GET /health
|
|
|
|
Check health status.
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"status": "healthy",
|
|
"version": "0.1.0",
|
|
"workers": 5,
|
|
"model": "Qwen 2.5 Coder 7b (q4_k_m)"
|
|
}
|
|
```
|
|
|
|
#### Federation Endpoints (when enabled)
|
|
|
|
**GET /v1/federation/status**
|
|
```json
|
|
{
|
|
"enabled": true,
|
|
"total_peers": 3,
|
|
"healthy_peers": 3,
|
|
"strategy": "weighted"
|
|
}
|
|
```
|
|
|
|
**GET /v1/federation/peers**
|
|
```json
|
|
{
|
|
"peers": [
|
|
{
|
|
"name": "desktop-pc",
|
|
"host": "192.168.1.100",
|
|
"port": 8000,
|
|
"model_id": "qwen2.5-coder:7b:q4_k_m",
|
|
"instances": 3
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
#### Issue: "No module named 'llama_cpp'"
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Install with pre-built wheel (recommended)
|
|
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
|
|
|
|
# Or CPU-only
|
|
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
|
|
```
|
|
|
|
#### Issue: "CUDA not detected" on Windows
|
|
|
|
**Solution:**
|
|
1. Install NVIDIA drivers: https://www.nvidia.com/drivers
|
|
2. Verify with: `nvidia-smi`
|
|
3. Reinstall with CUDA support:
|
|
```powershell
|
|
pip uninstall llama-cpp-python
|
|
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
|
|
```
|
|
|
|
#### Issue: "Out of memory" errors
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Reduce instances
|
|
python main.py --instances 2
|
|
|
|
# Or use smaller model
|
|
python main.py --model qwen2.5-coder:3b:q4
|
|
```
|
|
|
|
#### Issue: Slow performance on CPU
|
|
|
|
**Solution:**
|
|
- Use smaller models (3B instead of 7B)
|
|
- Use Q4 quantization instead of Q6
|
|
- Reduce number of instances to 2-3
|
|
- Close other applications
|
|
|
|
#### Issue: "No suitable model found"
|
|
|
|
**Solution:**
|
|
Your system has less than 2GB available memory. Try:
|
|
- Close other applications
|
|
- Use CPU-only mode (automatic if no GPU)
|
|
- Add more RAM or use a machine with GPU
|
|
|
|
#### Issue: Models not downloading
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Check internet connection
|
|
ping huggingface.co
|
|
|
|
# Try manual download
|
|
python main.py --download-only
|
|
|
|
# Check cache directory
|
|
ls ~/.local_swarm/models
|
|
```
|
|
|
|
### Platform-Specific Issues
|
|
|
|
**Windows:**
|
|
- Ensure Python is in PATH
|
|
- Run PowerShell as Administrator if needed
|
|
- Install Visual C++ Redistributable
|
|
|
|
**macOS:**
|
|
- Xcode Command Line Tools: `xcode-select --install`
|
|
- May need to allow llama.cpp in Security preferences
|
|
|
|
**Linux:**
|
|
- Install build essentials: `sudo apt-get install build-essential`
|
|
- For AMD: Install ROCm drivers
|
|
- For Intel: Install oneAPI toolkit
|
|
|
|
---
|
|
|
|
## Advanced Configuration
|
|
|
|
### Configuration File (config.yaml)
|
|
|
|
Create `config.yaml` in the project root:
|
|
|
|
```yaml
|
|
server:
|
|
host: "127.0.0.1"
|
|
port: 8000
|
|
|
|
swarm:
|
|
consensus_strategy: "similarity" # similarity, quality, fastest
|
|
min_instances: 2
|
|
max_instances: 5
|
|
|
|
federation:
|
|
enabled: false
|
|
discovery_port: 8765
|
|
federation_port: 8766
|
|
max_peers: 10
|
|
|
|
hardware:
|
|
gpu_memory_fraction: 1.0 # Use 100% of GPU VRAM
|
|
ram_fraction: 0.5 # Use 50% of system RAM for CPU
|
|
|
|
models:
|
|
cache_dir: "~/.local_swarm/models"
|
|
preferred_models:
|
|
- qwen2.5-coder
|
|
- deepseek-coder
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# Custom cache directory
|
|
export LOCAL_SWARM_CACHE_DIR="/path/to/models"
|
|
|
|
# Debug mode
|
|
export LOCAL_SWARM_DEBUG=1
|
|
|
|
# Custom config file
|
|
export LOCAL_SWARM_CONFIG="/path/to/config.yaml"
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Tuning
|
|
|
|
### For Maximum Speed
|
|
|
|
```bash
|
|
# Use smaller model
|
|
python main.py --model qwen2.5-coder:3b:q4
|
|
|
|
# Reduce instances (less memory contention)
|
|
python main.py --instances 2
|
|
|
|
# Skip consensus (single worker)
|
|
# Edit config: consensus_strategy: "fastest"
|
|
```
|
|
|
|
### For Maximum Quality
|
|
|
|
```bash
|
|
# Use largest model that fits
|
|
python main.py --model qwen2.5-coder:7b:q6
|
|
|
|
# More instances for better consensus
|
|
python main.py --instances 5
|
|
|
|
# Use quality consensus strategy
|
|
# Edit config: consensus_strategy: "quality"
|
|
```
|
|
|
|
### For Balanced Performance
|
|
|
|
```bash
|
|
# Recommended defaults (automatic)
|
|
python main.py
|
|
|
|
# Or explicitly
|
|
python main.py --model qwen2.5-coder:7b:q4
|
|
```
|
|
|
|
### Memory Usage by Model
|
|
|
|
| Model Size | Q4 VRAM | Q5 VRAM | Q6 VRAM |
|
|
|------------|---------|---------|---------|
|
|
| 1B-3B | 0.7-2GB | 0.9-2.5GB | 1.1-3GB |
|
|
| 7B | 4.5GB | 5.2GB | 6.0GB |
|
|
| 13B-15B | 8-9GB | 9.5-11GB | 11-13GB |
|
|
|
|
**Recommended:** Use Q4_K_M for best speed/quality balance.
|
|
|
|
---
|
|
|
|
## MCP Server Configuration
|
|
|
|
### Enable MCP Server
|
|
|
|
```bash
|
|
python main.py --mcp
|
|
```
|
|
|
|
### MCP Tools Available
|
|
|
|
When MCP is enabled, AI assistants can use:
|
|
|
|
- `get_hardware_info` - Query system capabilities
|
|
- `get_swarm_status` - Check swarm health
|
|
- `generate_code` - Generate with consensus
|
|
- `list_available_models` - Browse models
|
|
- `get_worker_details` - Worker statistics
|
|
|
|
### Testing MCP
|
|
|
|
```bash
|
|
# List available tools
|
|
mcp-cli call local-swarm list_tools
|
|
|
|
# Call a tool
|
|
mcp-cli call local-swarm call_tool get_swarm_status
|
|
```
|
|
|
|
---
|
|
|
|
## Network Federation
|
|
|
|
### Setup Federated Swarm
|
|
|
|
On each machine in your network:
|
|
|
|
```bash
|
|
# Machine 1 (Windows PC with RTX 4060)
|
|
python main.py --federation --port 8000
|
|
|
|
# Machine 2 (Mac Mini M1)
|
|
python main.py --federation --port 8000
|
|
|
|
# Machine 3 (Linux with AMD GPU)
|
|
python main.py --federation --port 8000
|
|
```
|
|
|
|
Machines will auto-discover each other via mDNS.
|
|
|
|
### Verify Federation
|
|
|
|
```bash
|
|
curl http://localhost:8000/v1/federation/status
|
|
curl http://localhost:8000/v1/federation/peers
|
|
```
|
|
|
|
---
|
|
|
|
## Getting Help
|
|
|
|
- **GitHub Issues:** https://github.com/sleepyeldrazi/local_swarm/issues
|
|
- **Interactive Help:** Run `python main.py` and select `[t] Tips & Help`
|
|
- **Hardware Detection:** Run `python main.py --detect`
|
|
|
|
## License
|
|
|
|
MIT License - See LICENSE file
|