429a3a4e3b
Documents the current state of network federation: - What's working (discovery, federation client, network binding) - What's missing (integration in main.py) - Relevant files and functions - Scope and limitations - Comprehensive TODO list for implementation Federation exists but isn't wired up to the main application flow.
205 lines
6.9 KiB
Markdown
205 lines
6.9 KiB
Markdown
# Network Federation Status
|
|
|
|
## Overview
|
|
Local Swarm has a federation system designed to allow multiple instances to collaborate on the same network, enabling distributed consensus and load balancing across multiple machines.
|
|
|
|
## Current Implementation Status
|
|
|
|
### ✅ What's Working
|
|
|
|
#### 1. Network Discovery (`src/network/discovery.py`)
|
|
**Purpose**: Automatic discovery of other Local Swarm instances on the local network using mDNS/Bonjour.
|
|
|
|
**Key Components**:
|
|
- `SwarmDiscovery` class - Main discovery service
|
|
- `PeerInfo` dataclass - Stores information about peer swarms
|
|
- `start_advertising()` - Announces this swarm to the network
|
|
- `start_discovery()` - Listens for other swarms on the network
|
|
- `create_discovery_service()` - Factory function to create discovery instance
|
|
|
|
**How It Works**:
|
|
- Uses mDNS service type: `_local-swarm._tcp.local.`
|
|
- Advertises on port 63323 (discovery) + API port (17615)
|
|
- Broadcasts: version, instances, model_id, hardware_summary
|
|
- Peers timeout after 60 seconds if not seen
|
|
|
|
#### 2. Federation Client (`src/network/federation.py`)
|
|
**Purpose**: Communication protocol between peer swarms.
|
|
|
|
**Key Components**:
|
|
- `FederationClient` class - HTTP client for peer communication
|
|
- `FederatedSwarm` class - Wraps local swarm with federation logic
|
|
- `request_vote()` - Gets generation results from peers
|
|
- `generate_with_federation()` - Coordinates distributed generation
|
|
- Federation strategies: `best_of_n`, `weighted_vote`, `first_valid`
|
|
|
|
**API Endpoints** (not yet exposed):
|
|
- `POST /v1/federation/vote` - Request generation from peer
|
|
- `GET /v1/federation/health` - Check peer health
|
|
|
|
#### 3. Network Binding (`main.py`)
|
|
**Purpose**: Secure local network access without internet exposure.
|
|
|
|
**Implementation**:
|
|
- `get_local_ip()` - Detects local network IP (192.x.x.x or 100.x.x.x)
|
|
- Binds to specific local IP instead of 0.0.0.0
|
|
- Falls back to localhost if not on private network
|
|
|
|
## ❌ What's Missing
|
|
|
|
### Critical Gap: No Integration
|
|
**The federation system exists as standalone modules but is NOT connected to the main application flow.**
|
|
|
|
**Specific Issues**:
|
|
|
|
1. **No CLI Flag**: No `--federation` or `--enable-federation` argument in `main.py`
|
|
|
|
2. **Discovery Never Starts**:
|
|
- `SwarmDiscovery` class is imported in `network/__init__.py`
|
|
- But never instantiated or started in `main.py`
|
|
- `start_advertising()` and `start_discovery()` are never called
|
|
|
|
3. **Federation Never Starts**:
|
|
- `FederatedSwarm` class exists but is never instantiated
|
|
- `main.py` calls `swarm.generate()` directly
|
|
- Should call `federated_swarm.generate_with_federation()` when enabled
|
|
|
|
4. **API Routes Not Registered**:
|
|
- Federation endpoints exist in `federation.py` but aren't added to FastAPI router
|
|
- Routes in `src/api/routes.py` don't include `/v1/federation/*`
|
|
|
|
5. **No Peer Management UI**:
|
|
- No way to see discovered peers
|
|
- No status dashboard for federation
|
|
- No manual peer configuration
|
|
|
|
## File Structure
|
|
|
|
```
|
|
src/network/
|
|
├── __init__.py # Exports SwarmDiscovery, FederationClient, etc.
|
|
├── discovery.py # mDNS/Bonjour discovery service
|
|
│ ├── SwarmDiscovery # Main discovery class
|
|
│ ├── PeerInfo # Peer information dataclass
|
|
│ └── create_discovery_service() # Factory function
|
|
├── federation.py # Inter-swarm communication
|
|
│ ├── FederationClient # HTTP client for peers
|
|
│ ├── FederatedSwarm # Wraps swarm with federation
|
|
│ ├── PeerVote # Vote from peer
|
|
│ └── FederationResult # Result of federated generation
|
|
└── (routes missing) # Should add federation routes
|
|
|
|
main.py # Should integrate federation here
|
|
└── Currently: Just runs local swarm
|
|
└── Should: Optionally run federated swarm with discovery
|
|
```
|
|
|
|
## Scope
|
|
|
|
### In Scope
|
|
- Automatic discovery of peers on same local network
|
|
- Distributed generation across multiple machines
|
|
- Consensus voting between local and peer responses
|
|
- Health checking and peer timeout handling
|
|
- Secure local network binding (no internet exposure)
|
|
|
|
### Out of Scope (Future)
|
|
- Internet-wide federation (would need authentication/encryption)
|
|
- Cross-platform federation (Mac ↔ Linux ↔ Windows)
|
|
- Peer authentication/authorization
|
|
- Encrypted peer communication
|
|
- WAN federation through NAT traversal
|
|
- Peer reputation/scoring system
|
|
|
|
## TODO
|
|
|
|
### Phase 1: Basic Integration (Minimum Viable)
|
|
1. **Add `--federation` CLI flag** to `main.py`
|
|
- Add argument parser entry
|
|
- Conditionally enable federation
|
|
|
|
2. **Integrate discovery in main flow**
|
|
```python
|
|
# In main.py after swarm initialization:
|
|
if args.federation:
|
|
discovery = await create_discovery_service(args.port)
|
|
await discovery.start_advertising(swarm_info)
|
|
await discovery.start_discovery()
|
|
```
|
|
|
|
3. **Add federation API routes** to `src/api/routes.py`
|
|
- `POST /v1/federation/vote`
|
|
- `GET /v1/federation/health`
|
|
- `GET /v1/federation/peers` (list discovered peers)
|
|
|
|
4. **Create FederatedSwarm wrapper**
|
|
```python
|
|
# Replace: result = await swarm.generate(...)
|
|
# With:
|
|
if args.federation:
|
|
federated = FederatedSwarm(swarm, discovery)
|
|
result = await federated.generate_with_federation(...)
|
|
else:
|
|
result = await swarm.generate(...)
|
|
```
|
|
|
|
### Phase 2: Polish
|
|
5. **Add peer status display**
|
|
- Show discovered peers in startup banner
|
|
- Display peer count in status
|
|
- Log when peers join/leave
|
|
|
|
6. **Handle edge cases**
|
|
- No peers available (fallback to local only)
|
|
- All peers timeout (graceful degradation)
|
|
- Split-brain scenarios
|
|
|
|
7. **Configuration**
|
|
- Config file support for federation settings
|
|
- Manual peer list (bypass discovery)
|
|
- Federation strategy selection
|
|
|
|
### Phase 3: Testing
|
|
8. **Integration tests**
|
|
- Two instances on same machine
|
|
- Two instances on same network
|
|
- Peer timeout handling
|
|
- Consensus validation
|
|
|
|
## Usage (When Complete)
|
|
|
|
### Start Federated Mode
|
|
```bash
|
|
# On Mac 1 (192.168.1.100)
|
|
python main.py --auto --federation
|
|
|
|
# On Mac 2 (192.168.1.101)
|
|
python main.py --auto --federation
|
|
|
|
# Both will:
|
|
# 1. Start local API on 192.168.x.x:17615
|
|
# 2. Advertise via mDNS
|
|
# 3. Discover each other within 5-10 seconds
|
|
# 4. Distribute generation requests between them
|
|
```
|
|
|
|
### Expected Behavior
|
|
1. Both Macs advertise themselves via mDNS
|
|
2. Each discovers the other within 10 seconds
|
|
3. When a request comes in, both generate responses
|
|
4. Consensus algorithm picks best response
|
|
5. Result returned to client
|
|
|
|
## Benefits When Complete
|
|
- **More workers**: Combine instances across machines
|
|
- **Better consensus**: More responses = better selection
|
|
- **Load balancing**: Distribute generation across devices
|
|
- **Redundancy**: If one fails, others continue
|
|
- **Heterogeneous hardware**: Mix Macs, PCs, servers
|
|
|
|
## Current Workaround
|
|
Until federation is integrated, you can:
|
|
1. Run instances independently on different machines
|
|
2. Point clients to specific instances manually
|
|
3. No automatic peer discovery or coordination
|