T

sleepy a27eb44f62 fix(federation): enable federation with tools

The issue was that federation was only used when NOT has_tools:
if use_federation and not has_tools:

This meant opencode (which always sends tools) would skip
federation entirely and only use local swarm.

Changed to:
if use_federation:

Now federation works for ALL requests including those with tools.

2026-02-25 00:07:34 +01:00

.fcg

Fix opencode integration: streaming, response format, and tool handling

2026-02-24 03:44:46 +01:00

config/prompts

feat(federation): add federation support to streaming path

2026-02-24 23:28:17 +01:00

docs

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

scripts

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

src

fix(federation): enable federation with tools

2026-02-25 00:07:34 +01:00

tests

feat: add federation tests and update README documentation

2026-02-24 23:48:39 +01:00

.gitignore

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

AGENT_REVIEW.md

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

AGENT_WORKER.md

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

main.py

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

README.md

feat: add federation tests and update README documentation

2026-02-24 23:48:39 +01:00

requirements-macos.txt

Initial commit: Local Swarm project structure and documentation

2026-02-23 16:46:31 +01:00

requirements.txt

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

setup.py

Initial commit: Local Swarm project structure and documentation

2026-02-23 16:46:31 +01:00

README.md

Local Swarm

Run a swarm of local LLMs on your hardware. Multiple models work together to give you the best answer through consensus voting.

What It Does

Auto-detects your hardware (NVIDIA, AMD, Intel, Apple Silicon, Qualcomm, or CPU)
Downloads and runs multiple LLM instances optimized for your VRAM/RAM
Uses consensus voting - all instances answer, best response wins
Connects multiple machines on your network for a "hive mind" effect
Provides an OpenAI-compatible API at http://localhost:17615/v1

Quick Start

# Clone and install
git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
pip install -r requirements.txt

# Run it
python main.py

On first run, it will:

Detect your hardware
Pick the best model and quantization
Download the model (one-time)
Start multiple LLM workers
Expose the API at http://localhost:17615

Usage

Interactive Mode (default)

python main.py

Shows a menu with:

Recommended configuration (auto-selected)
Browse all compatible models
Custom configuration wizard

python main.py --auto

With Other Options

python main.py --model qwen:3b:q4      # Use specific model
python main.py --instances 4           # Force 4 workers
python main.py --port 8080             # Custom port
python main.py --detect                # Show hardware info only
python main.py --federation            # Enable network federation
python main.py --mcp                   # Enable MCP server

Connect to Opencode

Add to your opencode config:

{
  "model": {
    "provider": "openai",
    "base_url": "http://localhost:17615/v1",
    "api_key": "not-needed",
    "model": "local-swarm"
  }
}

Network Federation (Hive Mind)

Run on multiple machines to combine their power:

Features

Parallel Execution: Local and peers generate simultaneously for faster consensus
Streaming Support: Federation works with streaming responses
Winner Tracking: Logs which node (local or peer) won consensus voting
Token Usage: Reports accurate token counts for federated responses

# Machine 1 (Windows with RTX 4060)
python main.py --auto --federation

# Machine 2 (Mac Mini M1)
python main.py --auto --federation

# Machine 3 (Old laptop)
python main.py --auto --federation

Machines auto-discover each other and vote together on every request.

Consensus with Federation

Your prompt goes to all LLM instances across all machines
Local swarm and all peers generate in parallel (2x faster)
Wait for all nodes to complete generation
Run global consensus across all responses
Use federated result (highest confidence from all nodes)

Token Reporting

Federation now provides accurate token counts:

Prompt tokens: Counted using tiktoken (cl100k_base encoding)
Completion tokens: Counted using tiktoken for federated response
Total tokens: Sum of prompt + completion tokens
Included in: Final streaming chunk and non-streaming responses

Federation with Streaming

Federation works with streaming responses:

Local swarm and all peers generate in parallel
Stream content from local while waiting for federation
Switch to federated result when consensus complete
Full token reporting in streaming mode

How Consensus Works

Your prompt goes to all LLM instances
Each instance generates a response independently
The consensus algorithm picks the best answer:
- Similarity (default): Groups responses by meaning, picks the largest group
- Quality: Scores on completeness, code blocks, structure
- Fastest: Returns the quickest response
- Majority: Simple text match voting

Configuration

Create config.yaml:

server:
  host: "127.0.0.1"
  port: 17615

swarm:
  consensus_strategy: "similarity"  # similarity, quality, fastest, majority
  min_instances: 2
  max_instances: 8

federation:
  enabled: true
  discovery_port: 8765
  max_peers: 10

Supported Hardware

Hardware	Backend	Notes
NVIDIA GPU	llama.cpp (CUDA)	Best performance
AMD GPU	llama.cpp (ROCm)	Linux/Windows
Intel GPU	llama.cpp (SYCL)	Linux/Windows
Apple Silicon	MLX	Native Metal
Qualcomm	llama.cpp (CPU)	Android/Termux
CPU-only	llama.cpp	Slower but works

Supported Models

Qwen 2.5 Coder (3B, 7B, 14B) - Recommended
DeepSeek Coder (1.3B, 6.7B, 33B)
CodeLlama (7B, 13B, 34B)

All support GGUF quantization (Q4_K_M recommended).

API Endpoints

GET /v1/models - List available models
POST /v1/chat/completions - Chat completion with consensus
GET /health - Health check
GET /v1/federation/peers - List discovered peers (when federation enabled)

Troubleshooting

Out of Memory

python main.py --instances 2           # Reduce workers
python main.py --model qwen:3b:q4      # Use smaller model

Slow Performance

Check GPU utilization with nvidia-smi
Reduce instances to avoid contention
Use Q4 quantization instead of Q6

CUDA Not Detected (Windows)

nvidia-smi  # Check drivers
pip uninstall llama-cpp-python
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

macOS: MLX Not Found

pip install mlx-lm

Project Structure

local_swarm/
├── main.py                   # CLI entry point
├── src/
│   ├── hardware/            # GPU detection (NVIDIA, AMD, Intel, Apple, Qualcomm)
│   ├── models/              # Model registry, selection, downloading
│   ├── backends/            # llama.cpp and MLX backends
│   ├── swarm/               # Worker management and consensus
│   ├── network/             # Federation and peer discovery
│   ├── api/                 # OpenAI-compatible API server
│   └── tools/               # Tool execution (read, write, bash)
└── docs/                    # Documentation

License

MIT License

README.md

Local Swarm

What It Does

Quick Start

Usage

Interactive Mode (default)

Auto Mode (no menu)

With Other Options

Connect to Opencode

Network Federation (Hive Mind)

Features

Consensus with Federation

Token Reporting

Federation with Streaming

How Consensus Works

Configuration

Supported Hardware

Supported Models

API Endpoints

Troubleshooting

Out of Memory

Slow Performance

CUDA Not Detected (Windows)

macOS: MLX Not Found

Project Structure

License