Files
local_swarm/NETWORK.md
T
sleepy 429a3a4e3b Add NETWORK.md documenting federation status and TODO
Documents the current state of network federation:
- What's working (discovery, federation client, network binding)
- What's missing (integration in main.py)
- Relevant files and functions
- Scope and limitations
- Comprehensive TODO list for implementation

Federation exists but isn't wired up to the main application flow.
2026-02-24 04:14:18 +01:00

205 lines
6.9 KiB
Markdown

# Network Federation Status
## Overview
Local Swarm has a federation system designed to allow multiple instances to collaborate on the same network, enabling distributed consensus and load balancing across multiple machines.
## Current Implementation Status
### ✅ What's Working
#### 1. Network Discovery (`src/network/discovery.py`)
**Purpose**: Automatic discovery of other Local Swarm instances on the local network using mDNS/Bonjour.
**Key Components**:
- `SwarmDiscovery` class - Main discovery service
- `PeerInfo` dataclass - Stores information about peer swarms
- `start_advertising()` - Announces this swarm to the network
- `start_discovery()` - Listens for other swarms on the network
- `create_discovery_service()` - Factory function to create discovery instance
**How It Works**:
- Uses mDNS service type: `_local-swarm._tcp.local.`
- Advertises on port 63323 (discovery) + API port (17615)
- Broadcasts: version, instances, model_id, hardware_summary
- Peers timeout after 60 seconds if not seen
#### 2. Federation Client (`src/network/federation.py`)
**Purpose**: Communication protocol between peer swarms.
**Key Components**:
- `FederationClient` class - HTTP client for peer communication
- `FederatedSwarm` class - Wraps local swarm with federation logic
- `request_vote()` - Gets generation results from peers
- `generate_with_federation()` - Coordinates distributed generation
- Federation strategies: `best_of_n`, `weighted_vote`, `first_valid`
**API Endpoints** (not yet exposed):
- `POST /v1/federation/vote` - Request generation from peer
- `GET /v1/federation/health` - Check peer health
#### 3. Network Binding (`main.py`)
**Purpose**: Secure local network access without internet exposure.
**Implementation**:
- `get_local_ip()` - Detects local network IP (192.x.x.x or 100.x.x.x)
- Binds to specific local IP instead of 0.0.0.0
- Falls back to localhost if not on private network
## ❌ What's Missing
### Critical Gap: No Integration
**The federation system exists as standalone modules but is NOT connected to the main application flow.**
**Specific Issues**:
1. **No CLI Flag**: No `--federation` or `--enable-federation` argument in `main.py`
2. **Discovery Never Starts**:
- `SwarmDiscovery` class is imported in `network/__init__.py`
- But never instantiated or started in `main.py`
- `start_advertising()` and `start_discovery()` are never called
3. **Federation Never Starts**:
- `FederatedSwarm` class exists but is never instantiated
- `main.py` calls `swarm.generate()` directly
- Should call `federated_swarm.generate_with_federation()` when enabled
4. **API Routes Not Registered**:
- Federation endpoints exist in `federation.py` but aren't added to FastAPI router
- Routes in `src/api/routes.py` don't include `/v1/federation/*`
5. **No Peer Management UI**:
- No way to see discovered peers
- No status dashboard for federation
- No manual peer configuration
## File Structure
```
src/network/
├── __init__.py # Exports SwarmDiscovery, FederationClient, etc.
├── discovery.py # mDNS/Bonjour discovery service
│ ├── SwarmDiscovery # Main discovery class
│ ├── PeerInfo # Peer information dataclass
│ └── create_discovery_service() # Factory function
├── federation.py # Inter-swarm communication
│ ├── FederationClient # HTTP client for peers
│ ├── FederatedSwarm # Wraps swarm with federation
│ ├── PeerVote # Vote from peer
│ └── FederationResult # Result of federated generation
└── (routes missing) # Should add federation routes
main.py # Should integrate federation here
└── Currently: Just runs local swarm
└── Should: Optionally run federated swarm with discovery
```
## Scope
### In Scope
- Automatic discovery of peers on same local network
- Distributed generation across multiple machines
- Consensus voting between local and peer responses
- Health checking and peer timeout handling
- Secure local network binding (no internet exposure)
### Out of Scope (Future)
- Internet-wide federation (would need authentication/encryption)
- Cross-platform federation (Mac ↔ Linux ↔ Windows)
- Peer authentication/authorization
- Encrypted peer communication
- WAN federation through NAT traversal
- Peer reputation/scoring system
## TODO
### Phase 1: Basic Integration (Minimum Viable)
1. **Add `--federation` CLI flag** to `main.py`
- Add argument parser entry
- Conditionally enable federation
2. **Integrate discovery in main flow**
```python
# In main.py after swarm initialization:
if args.federation:
discovery = await create_discovery_service(args.port)
await discovery.start_advertising(swarm_info)
await discovery.start_discovery()
```
3. **Add federation API routes** to `src/api/routes.py`
- `POST /v1/federation/vote`
- `GET /v1/federation/health`
- `GET /v1/federation/peers` (list discovered peers)
4. **Create FederatedSwarm wrapper**
```python
# Replace: result = await swarm.generate(...)
# With:
if args.federation:
federated = FederatedSwarm(swarm, discovery)
result = await federated.generate_with_federation(...)
else:
result = await swarm.generate(...)
```
### Phase 2: Polish
5. **Add peer status display**
- Show discovered peers in startup banner
- Display peer count in status
- Log when peers join/leave
6. **Handle edge cases**
- No peers available (fallback to local only)
- All peers timeout (graceful degradation)
- Split-brain scenarios
7. **Configuration**
- Config file support for federation settings
- Manual peer list (bypass discovery)
- Federation strategy selection
### Phase 3: Testing
8. **Integration tests**
- Two instances on same machine
- Two instances on same network
- Peer timeout handling
- Consensus validation
## Usage (When Complete)
### Start Federated Mode
```bash
# On Mac 1 (192.168.1.100)
python main.py --auto --federation
# On Mac 2 (192.168.1.101)
python main.py --auto --federation
# Both will:
# 1. Start local API on 192.168.x.x:17615
# 2. Advertise via mDNS
# 3. Discover each other within 5-10 seconds
# 4. Distribute generation requests between them
```
### Expected Behavior
1. Both Macs advertise themselves via mDNS
2. Each discovers the other within 10 seconds
3. When a request comes in, both generate responses
4. Consensus algorithm picks best response
5. Result returned to client
## Benefits When Complete
- **More workers**: Combine instances across machines
- **Better consensus**: More responses = better selection
- **Load balancing**: Distribute generation across devices
- **Redundancy**: If one fails, others continue
- **Heterogeneous hardware**: Mix Macs, PCs, servers
## Current Workaround
Until federation is integrated, you can:
1. Run instances independently on different machines
2. Point clients to specific instances manually
3. No automatic peer discovery or coordination