sleepy a27eb44f62 fix(federation): enable federation with tools
The issue was that federation was only used when NOT has_tools:
if use_federation and not has_tools:

This meant opencode (which always sends tools) would skip
federation entirely and only use local swarm.

Changed to:
if use_federation:

Now federation works for ALL requests including those with tools.
2026-02-25 00:07:34 +01:00

Local Swarm

Run a swarm of local LLMs on your hardware. Multiple models work together to give you the best answer through consensus voting.

What It Does

  • Auto-detects your hardware (NVIDIA, AMD, Intel, Apple Silicon, Qualcomm, or CPU)
  • Downloads and runs multiple LLM instances optimized for your VRAM/RAM
  • Uses consensus voting - all instances answer, best response wins
  • Connects multiple machines on your network for a "hive mind" effect
  • Provides an OpenAI-compatible API at http://localhost:17615/v1

Quick Start

# Clone and install
git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
pip install -r requirements.txt

# Run it
python main.py

On first run, it will:

  1. Detect your hardware
  2. Pick the best model and quantization
  3. Download the model (one-time)
  4. Start multiple LLM workers
  5. Expose the API at http://localhost:17615

Usage

Interactive Mode (default)

python main.py

Shows a menu with:

  • Recommended configuration (auto-selected)
  • Browse all compatible models
  • Custom configuration wizard

Auto Mode (no menu)

python main.py --auto

With Other Options

python main.py --model qwen:3b:q4      # Use specific model
python main.py --instances 4           # Force 4 workers
python main.py --port 8080             # Custom port
python main.py --detect                # Show hardware info only
python main.py --federation            # Enable network federation
python main.py --mcp                   # Enable MCP server

Connect to Opencode

Add to your opencode config:

{
  "model": {
    "provider": "openai",
    "base_url": "http://localhost:17615/v1",
    "api_key": "not-needed",
    "model": "local-swarm"
  }
}

Network Federation (Hive Mind)

Run on multiple machines to combine their power:

Features

  • Parallel Execution: Local and peers generate simultaneously for faster consensus
  • Streaming Support: Federation works with streaming responses
  • Winner Tracking: Logs which node (local or peer) won consensus voting
  • Token Usage: Reports accurate token counts for federated responses
# Machine 1 (Windows with RTX 4060)
python main.py --auto --federation

# Machine 2 (Mac Mini M1)
python main.py --auto --federation

# Machine 3 (Old laptop)
python main.py --auto --federation

Machines auto-discover each other and vote together on every request.

Consensus with Federation

  1. Your prompt goes to all LLM instances across all machines
  2. Local swarm and all peers generate in parallel (2x faster)
  3. Wait for all nodes to complete generation
  4. Run global consensus across all responses
  5. Use federated result (highest confidence from all nodes)

Token Reporting

Federation now provides accurate token counts:

  • Prompt tokens: Counted using tiktoken (cl100k_base encoding)
  • Completion tokens: Counted using tiktoken for federated response
  • Total tokens: Sum of prompt + completion tokens
  • Included in: Final streaming chunk and non-streaming responses

Federation with Streaming

Federation works with streaming responses:

  • Local swarm and all peers generate in parallel
  • Stream content from local while waiting for federation
  • Switch to federated result when consensus complete
  • Full token reporting in streaming mode

How Consensus Works

  1. Your prompt goes to all LLM instances
  2. Each instance generates a response independently
  3. The consensus algorithm picks the best answer:
    • Similarity (default): Groups responses by meaning, picks the largest group
    • Quality: Scores on completeness, code blocks, structure
    • Fastest: Returns the quickest response
    • Majority: Simple text match voting

Configuration

Create config.yaml:

server:
  host: "127.0.0.1"
  port: 17615

swarm:
  consensus_strategy: "similarity"  # similarity, quality, fastest, majority
  min_instances: 2
  max_instances: 8

federation:
  enabled: true
  discovery_port: 8765
  max_peers: 10

Supported Hardware

Hardware Backend Notes
NVIDIA GPU llama.cpp (CUDA) Best performance
AMD GPU llama.cpp (ROCm) Linux/Windows
Intel GPU llama.cpp (SYCL) Linux/Windows
Apple Silicon MLX Native Metal
Qualcomm llama.cpp (CPU) Android/Termux
CPU-only llama.cpp Slower but works

Supported Models

  • Qwen 2.5 Coder (3B, 7B, 14B) - Recommended
  • DeepSeek Coder (1.3B, 6.7B, 33B)
  • CodeLlama (7B, 13B, 34B)

All support GGUF quantization (Q4_K_M recommended).

API Endpoints

  • GET /v1/models - List available models
  • POST /v1/chat/completions - Chat completion with consensus
  • GET /health - Health check
  • GET /v1/federation/peers - List discovered peers (when federation enabled)

Troubleshooting

Out of Memory

python main.py --instances 2           # Reduce workers
python main.py --model qwen:3b:q4      # Use smaller model

Slow Performance

  • Check GPU utilization with nvidia-smi
  • Reduce instances to avoid contention
  • Use Q4 quantization instead of Q6

CUDA Not Detected (Windows)

nvidia-smi  # Check drivers
pip uninstall llama-cpp-python
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

macOS: MLX Not Found

pip install mlx-lm

Project Structure

local_swarm/
├── main.py                   # CLI entry point
├── src/
│   ├── hardware/            # GPU detection (NVIDIA, AMD, Intel, Apple, Qualcomm)
│   ├── models/              # Model registry, selection, downloading
│   ├── backends/            # llama.cpp and MLX backends
│   ├── swarm/               # Worker management and consensus
│   ├── network/             # Federation and peer discovery
│   ├── api/                 # OpenAI-compatible API server
│   └── tools/               # Tool execution (read, write, bash)
└── docs/                    # Documentation

License

MIT License

S
Description
No description provided
Readme 444 KiB
Languages
Python 99%
Shell 0.7%
Batchfile 0.3%