Create docs/GUIDE.md with complete documentation: - Quick Start Guide for all platforms - Opencode configuration examples: - Basic configuration - Remote machine setup - Multiple model options - Environment-specific configs - Testing instructions - API Reference: - All OpenAI-compatible endpoints - Federation endpoints - Request/response examples - Troubleshooting Guide: - Common issues and solutions - Platform-specific problems - Installation issues - Advanced Configuration: - config.yaml options - Environment variables - Performance Tuning: - Speed vs quality settings - Memory usage tables - Recommended configurations - MCP Server setup and usage - Network Federation guide Update README.md: - Add Documentation section with links - Reference the complete guide Documentation now covers: ✅ Installation all platforms ✅ Opencode integration ✅ API usage ✅ Troubleshooting ✅ Performance optimization ✅ Advanced features
9.5 KiB
Local Swarm - Complete Documentation
Table of Contents
- Quick Start Guide
- Opencode Configuration
- API Reference
- Troubleshooting
- Advanced Configuration
- Performance Tuning
Quick Start Guide
Installation
Windows:
git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
.\scripts\install.bat
macOS/Linux:
git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
chmod +x scripts/install.sh
./scripts/install.sh
Android (Termux):
git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
chmod +x scripts/install-termux.sh
./scripts/install-termux.sh
First Run
# Start with interactive menu
python main.py
# Or skip menu with auto-detection
python main.py --auto
Opencode Configuration
Basic Configuration
Add to your opencode configuration file (usually ~/.config/opencode/config.json):
{
"model": {
"provider": "openai",
"base_url": "http://localhost:8000/v1",
"api_key": "not-needed",
"model": "local-swarm"
}
}
Configuration with Local Swarm on Different Machine
If Local Swarm is running on another computer in your network:
{
"model": {
"provider": "openai",
"base_url": "http://192.168.1.100:8000/v1",
"api_key": "not-needed",
"model": "local-swarm"
}
}
Multiple Model Options
You can configure multiple models and switch between them:
{
"models": {
"local-swarm": {
"provider": "openai",
"base_url": "http://localhost:8000/v1",
"api_key": "not-needed",
"model": "local-swarm"
},
"local-swarm-fast": {
"provider": "openai",
"base_url": "http://localhost:8000/v1",
"api_key": "not-needed",
"model": "local-swarm",
"temperature": 0.2
}
},
"default_model": "local-swarm"
}
With Context Window Configuration
{
"model": {
"provider": "openai",
"base_url": "http://localhost:8000/v1",
"api_key": "not-needed",
"model": "local-swarm",
"max_tokens": 4096,
"temperature": 0.7
}
}
Environment-Specific Configurations
Development (local only):
{
"model": {
"provider": "openai",
"base_url": "http://localhost:8000/v1",
"api_key": "not-needed",
"model": "local-swarm",
"temperature": 0.8
}
}
Production (federated swarm):
{
"model": {
"provider": "openai",
"base_url": "http://swarm-coordinator.local:8000/v1",
"api_key": "not-needed",
"model": "local-swarm",
"temperature": 0.5
}
}
Testing the Configuration
After configuring opencode, test with:
# Simple test
opencode --version
# Test with a prompt
echo "Write a Python function to calculate factorial" | opencode
API Reference
OpenAI-Compatible Endpoints
Local Swarm implements the OpenAI API specification.
POST /v1/chat/completions
Generate a chat completion.
Request:
{
"model": "local-swarm",
"messages": [
{"role": "user", "content": "Write a Python function to calculate factorial"}
],
"max_tokens": 2048,
"temperature": 0.7,
"stream": false
}
Response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "local-swarm",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "def factorial(n):\n if n <= 1:\n return 1\n return n * factorial(n-1)"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 25,
"total_tokens": 40
}
}
GET /v1/models
List available models.
Response:
{
"object": "list",
"data": [
{
"id": "local-swarm",
"object": "model",
"created": 1234567890,
"owned_by": "local-swarm"
}
]
}
GET /health
Check health status.
Response:
{
"status": "healthy",
"version": "0.1.0",
"workers": 5,
"model": "Qwen 2.5 Coder 7b (q4_k_m)"
}
Federation Endpoints (when enabled)
GET /v1/federation/status
{
"enabled": true,
"total_peers": 3,
"healthy_peers": 3,
"strategy": "weighted"
}
GET /v1/federation/peers
{
"peers": [
{
"name": "desktop-pc",
"host": "192.168.1.100",
"port": 8000,
"model_id": "qwen2.5-coder:7b:q4_k_m",
"instances": 3
}
]
}
Troubleshooting
Common Issues
Issue: "No module named 'llama_cpp'"
Solution:
# Install with pre-built wheel (recommended)
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
# Or CPU-only
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
Issue: "CUDA not detected" on Windows
Solution:
- Install NVIDIA drivers: https://www.nvidia.com/drivers
- Verify with:
nvidia-smi - Reinstall with CUDA support:
pip uninstall llama-cpp-python
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
Issue: "Out of memory" errors
Solution:
# Reduce instances
python main.py --instances 2
# Or use smaller model
python main.py --model qwen2.5-coder:3b:q4
Issue: Slow performance on CPU
Solution:
- Use smaller models (3B instead of 7B)
- Use Q4 quantization instead of Q6
- Reduce number of instances to 2-3
- Close other applications
Issue: "No suitable model found"
Solution: Your system has less than 2GB available memory. Try:
- Close other applications
- Use CPU-only mode (automatic if no GPU)
- Add more RAM or use a machine with GPU
Issue: Models not downloading
Solution:
# Check internet connection
ping huggingface.co
# Try manual download
python main.py --download-only
# Check cache directory
ls ~/.local_swarm/models
Platform-Specific Issues
Windows:
- Ensure Python is in PATH
- Run PowerShell as Administrator if needed
- Install Visual C++ Redistributable
macOS:
- Xcode Command Line Tools:
xcode-select --install - May need to allow llama.cpp in Security preferences
Linux:
- Install build essentials:
sudo apt-get install build-essential - For AMD: Install ROCm drivers
- For Intel: Install oneAPI toolkit
Advanced Configuration
Configuration File (config.yaml)
Create config.yaml in the project root:
server:
host: "127.0.0.1"
port: 8000
swarm:
consensus_strategy: "similarity" # similarity, quality, fastest
min_instances: 2
max_instances: 5
federation:
enabled: false
discovery_port: 8765
federation_port: 8766
max_peers: 10
hardware:
gpu_memory_fraction: 1.0 # Use 100% of GPU VRAM
ram_fraction: 0.5 # Use 50% of system RAM for CPU
models:
cache_dir: "~/.local_swarm/models"
preferred_models:
- qwen2.5-coder
- deepseek-coder
Environment Variables
# Custom cache directory
export LOCAL_SWARM_CACHE_DIR="/path/to/models"
# Debug mode
export LOCAL_SWARM_DEBUG=1
# Custom config file
export LOCAL_SWARM_CONFIG="/path/to/config.yaml"
Performance Tuning
For Maximum Speed
# Use smaller model
python main.py --model qwen2.5-coder:3b:q4
# Reduce instances (less memory contention)
python main.py --instances 2
# Skip consensus (single worker)
# Edit config: consensus_strategy: "fastest"
For Maximum Quality
# Use largest model that fits
python main.py --model qwen2.5-coder:7b:q6
# More instances for better consensus
python main.py --instances 5
# Use quality consensus strategy
# Edit config: consensus_strategy: "quality"
For Balanced Performance
# Recommended defaults (automatic)
python main.py
# Or explicitly
python main.py --model qwen2.5-coder:7b:q4
Memory Usage by Model
| Model Size | Q4 VRAM | Q5 VRAM | Q6 VRAM |
|---|---|---|---|
| 1B-3B | 0.7-2GB | 0.9-2.5GB | 1.1-3GB |
| 7B | 4.5GB | 5.2GB | 6.0GB |
| 13B-15B | 8-9GB | 9.5-11GB | 11-13GB |
Recommended: Use Q4_K_M for best speed/quality balance.
MCP Server Configuration
Enable MCP Server
python main.py --mcp
MCP Tools Available
When MCP is enabled, AI assistants can use:
get_hardware_info- Query system capabilitiesget_swarm_status- Check swarm healthgenerate_code- Generate with consensuslist_available_models- Browse modelsget_worker_details- Worker statistics
Testing MCP
# List available tools
mcp-cli call local-swarm list_tools
# Call a tool
mcp-cli call local-swarm call_tool get_swarm_status
Network Federation
Setup Federated Swarm
On each machine in your network:
# Machine 1 (Windows PC with RTX 4060)
python main.py --federation --port 8000
# Machine 2 (Mac Mini M1)
python main.py --federation --port 8000
# Machine 3 (Linux with AMD GPU)
python main.py --federation --port 8000
Machines will auto-discover each other via mDNS.
Verify Federation
curl http://localhost:8000/v1/federation/status
curl http://localhost:8000/v1/federation/peers
Getting Help
- GitHub Issues: https://github.com/sleepyeldrazi/local_swarm/issues
- Interactive Help: Run
python main.pyand select[t] Tips & Help - Hardware Detection: Run
python main.py --detect
License
MIT License - See LICENSE file