T

sleepy 3dc06c73ef feat(federation): add winner tracking and token usage reporting

- Track which node won the consensus voting (local or peer name)
- Add winner to FederationResult dataclass
- Log winner in server logs
- Calculate and report token usage in federation streaming
- Fix prompt_tokens calculation in streaming path

Now opencode will show:
- Context tokens used
- Which node won the vote (in logs)

2026-02-24 23:40:41 +01:00

.fcg

Fix opencode integration: streaming, response format, and tool handling

2026-02-24 03:44:46 +01:00

config/prompts

feat(federation): add federation support to streaming path

2026-02-24 23:28:17 +01:00

docs

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

scripts

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

src

feat(federation): add winner tracking and token usage reporting

2026-02-24 23:40:41 +01:00

tests

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

.gitignore

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

AGENT_REVIEW.md

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

AGENT_WORKER.md

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

main.py

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

README.md

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

requirements-macos.txt

Initial commit: Local Swarm project structure and documentation

2026-02-23 16:46:31 +01:00

requirements.txt

feat: comprehensive tool system improvements and webfetch support (#3 )

2026-02-24 22:35:05 +01:00

setup.py

Initial commit: Local Swarm project structure and documentation

2026-02-23 16:46:31 +01:00

README.md

Local Swarm

Run a swarm of local LLMs on your hardware. Multiple models work together to give you the best answer through consensus voting.

What It Does

Auto-detects your hardware (NVIDIA, AMD, Intel, Apple Silicon, Qualcomm, or CPU)
Downloads and runs multiple LLM instances optimized for your VRAM/RAM
Uses consensus voting - all instances answer, best response wins
Connects multiple machines on your network for a "hive mind" effect
Provides an OpenAI-compatible API at http://localhost:17615/v1

Quick Start

# Clone and install
git clone https://github.com/yourusername/local_swarm.git
cd local_swarm
pip install -r requirements.txt

# Run it
python main.py

On first run, it will:

Detect your hardware
Pick the best model and quantization
Download the model (one-time)
Start multiple LLM workers
Expose the API at http://localhost:17615

Usage

Interactive Mode (default)

python main.py

Shows a menu with:

Recommended configuration (auto-selected)
Browse all compatible models
Custom configuration wizard

python main.py --auto

With Other Options

python main.py --model qwen:3b:q4      # Use specific model
python main.py --instances 4           # Force 4 workers
python main.py --port 8080             # Custom port
python main.py --detect                # Show hardware info only
python main.py --federation            # Enable network federation
python main.py --mcp                   # Enable MCP server

Connect to Opencode

Add to your opencode config:

{
  "model": {
    "provider": "openai",
    "base_url": "http://localhost:17615/v1",
    "api_key": "not-needed",
    "model": "local-swarm"
  }
}

Network Federation (Hive Mind)

Run on multiple machines to combine their power:

# Machine 1 (Windows with RTX 4060)
python main.py --auto --federation

# Machine 2 (Mac Mini M1)
python main.py --auto --federation

# Machine 3 (Old laptop)
python main.py --auto --federation

Machines auto-discover each other and vote together on every request.

How Consensus Works

Your prompt goes to all LLM instances
Each instance generates a response independently
The consensus algorithm picks the best answer:
- Similarity (default): Groups responses by meaning, picks the largest group
- Quality: Scores on completeness, code blocks, structure
- Fastest: Returns the quickest response
- Majority: Simple text match voting

Configuration

Create config.yaml:

server:
  host: "127.0.0.1"
  port: 17615

swarm:
  consensus_strategy: "similarity"  # similarity, quality, fastest, majority
  min_instances: 2
  max_instances: 8

federation:
  enabled: true
  discovery_port: 8765
  max_peers: 10

Supported Hardware

Hardware	Backend	Notes
NVIDIA GPU	llama.cpp (CUDA)	Best performance
AMD GPU	llama.cpp (ROCm)	Linux/Windows
Intel GPU	llama.cpp (SYCL)	Linux/Windows
Apple Silicon	MLX	Native Metal
Qualcomm	llama.cpp (CPU)	Android/Termux
CPU-only	llama.cpp	Slower but works

Supported Models

Qwen 2.5 Coder (3B, 7B, 14B) - Recommended
DeepSeek Coder (1.3B, 6.7B, 33B)
CodeLlama (7B, 13B, 34B)

All support GGUF quantization (Q4_K_M recommended).

API Endpoints

GET /v1/models - List available models
POST /v1/chat/completions - Chat completion with consensus
GET /health - Health check
GET /v1/federation/peers - List discovered peers (when federation enabled)

Troubleshooting

Out of Memory

python main.py --instances 2           # Reduce workers
python main.py --model qwen:3b:q4      # Use smaller model

Slow Performance

Check GPU utilization with nvidia-smi
Reduce instances to avoid contention
Use Q4 quantization instead of Q6

CUDA Not Detected (Windows)

nvidia-smi  # Check drivers
pip uninstall llama-cpp-python
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

macOS: MLX Not Found

pip install mlx-lm

Project Structure

local_swarm/
├── main.py                   # CLI entry point
├── src/
│   ├── hardware/            # GPU detection (NVIDIA, AMD, Intel, Apple, Qualcomm)
│   ├── models/              # Model registry, selection, downloading
│   ├── backends/            # llama.cpp and MLX backends
│   ├── swarm/               # Worker management and consensus
│   ├── network/             # Federation and peer discovery
│   ├── api/                 # OpenAI-compatible API server
│   └── tools/               # Tool execution (read, write, bash)
└── docs/                    # Documentation

License

MIT License

README.md

Local Swarm

What It Does

Quick Start

Usage

Interactive Mode (default)

Auto Mode (no menu)

With Other Options

Connect to Opencode

Network Federation (Hive Mind)

How Consensus Works

Configuration

Supported Hardware

Supported Models

API Endpoints

Troubleshooting

Out of Memory

Slow Performance

CUDA Not Detected (Windows)

macOS: MLX Not Found

Project Structure

License