# Network Federation Status ## Overview Local Swarm has a federation system designed to allow multiple instances to collaborate on the same network, enabling distributed consensus and load balancing across multiple machines. ## Current Implementation Status ### ✅ What's Working #### 1. Network Discovery (`src/network/discovery.py`) **Purpose**: Automatic discovery of other Local Swarm instances on the local network using mDNS/Bonjour. **Key Components**: - `SwarmDiscovery` class - Main discovery service - `PeerInfo` dataclass - Stores information about peer swarms - `start_advertising()` - Announces this swarm to the network - `start_discovery()` - Listens for other swarms on the network - `create_discovery_service()` - Factory function to create discovery instance **How It Works**: - Uses mDNS service type: `_local-swarm._tcp.local.` - Advertises on port 63323 (discovery) + API port (17615) - Broadcasts: version, instances, model_id, hardware_summary - Peers timeout after 60 seconds if not seen #### 2. Federation Client (`src/network/federation.py`) **Purpose**: Communication protocol between peer swarms. **Key Components**: - `FederationClient` class - HTTP client for peer communication - `FederatedSwarm` class - Wraps local swarm with federation logic - `request_vote()` - Gets generation results from peers - `generate_with_federation()` - Coordinates distributed generation - Federation strategies: `best_of_n`, `weighted_vote`, `first_valid` **API Endpoints** (not yet exposed): - `POST /v1/federation/vote` - Request generation from peer - `GET /v1/federation/health` - Check peer health #### 3. Network Binding (`main.py`) **Purpose**: Secure local network access without internet exposure. **Implementation**: - `get_local_ip()` - Detects local network IP (192.x.x.x or 100.x.x.x) - Binds to specific local IP instead of 0.0.0.0 - Falls back to localhost if not on private network ## ❌ What's Missing ### Critical Gap: No Integration **The federation system exists as standalone modules but is NOT connected to the main application flow.** **Specific Issues**: 1. **No CLI Flag**: No `--federation` or `--enable-federation` argument in `main.py` 2. **Discovery Never Starts**: - `SwarmDiscovery` class is imported in `network/__init__.py` - But never instantiated or started in `main.py` - `start_advertising()` and `start_discovery()` are never called 3. **Federation Never Starts**: - `FederatedSwarm` class exists but is never instantiated - `main.py` calls `swarm.generate()` directly - Should call `federated_swarm.generate_with_federation()` when enabled 4. **API Routes Not Registered**: - Federation endpoints exist in `federation.py` but aren't added to FastAPI router - Routes in `src/api/routes.py` don't include `/v1/federation/*` 5. **No Peer Management UI**: - No way to see discovered peers - No status dashboard for federation - No manual peer configuration ## File Structure ``` src/network/ ├── __init__.py # Exports SwarmDiscovery, FederationClient, etc. ├── discovery.py # mDNS/Bonjour discovery service │ ├── SwarmDiscovery # Main discovery class │ ├── PeerInfo # Peer information dataclass │ └── create_discovery_service() # Factory function ├── federation.py # Inter-swarm communication │ ├── FederationClient # HTTP client for peers │ ├── FederatedSwarm # Wraps swarm with federation │ ├── PeerVote # Vote from peer │ └── FederationResult # Result of federated generation └── (routes missing) # Should add federation routes main.py # Should integrate federation here └── Currently: Just runs local swarm └── Should: Optionally run federated swarm with discovery ``` ## Scope ### In Scope - Automatic discovery of peers on same local network - Distributed generation across multiple machines - Consensus voting between local and peer responses - Health checking and peer timeout handling - Secure local network binding (no internet exposure) ### Out of Scope (Future) - Internet-wide federation (would need authentication/encryption) - Cross-platform federation (Mac ↔ Linux ↔ Windows) - Peer authentication/authorization - Encrypted peer communication - WAN federation through NAT traversal - Peer reputation/scoring system ## TODO ### Phase 1: Basic Integration (Minimum Viable) 1. **Add `--federation` CLI flag** to `main.py` - Add argument parser entry - Conditionally enable federation 2. **Integrate discovery in main flow** ```python # In main.py after swarm initialization: if args.federation: discovery = await create_discovery_service(args.port) await discovery.start_advertising(swarm_info) await discovery.start_discovery() ``` 3. **Add federation API routes** to `src/api/routes.py` - `POST /v1/federation/vote` - `GET /v1/federation/health` - `GET /v1/federation/peers` (list discovered peers) 4. **Create FederatedSwarm wrapper** ```python # Replace: result = await swarm.generate(...) # With: if args.federation: federated = FederatedSwarm(swarm, discovery) result = await federated.generate_with_federation(...) else: result = await swarm.generate(...) ``` ### Phase 2: Polish 5. **Add peer status display** - Show discovered peers in startup banner - Display peer count in status - Log when peers join/leave 6. **Handle edge cases** - No peers available (fallback to local only) - All peers timeout (graceful degradation) - Split-brain scenarios 7. **Configuration** - Config file support for federation settings - Manual peer list (bypass discovery) - Federation strategy selection ### Phase 3: Testing 8. **Integration tests** - Two instances on same machine - Two instances on same network - Peer timeout handling - Consensus validation ## Usage (When Complete) ### Start Federated Mode ```bash # On Mac 1 (192.168.1.100) python main.py --auto --federation # On Mac 2 (192.168.1.101) python main.py --auto --federation # Both will: # 1. Start local API on 192.168.x.x:17615 # 2. Advertise via mDNS # 3. Discover each other within 5-10 seconds # 4. Distribute generation requests between them ``` ### Expected Behavior 1. Both Macs advertise themselves via mDNS 2. Each discovers the other within 10 seconds 3. When a request comes in, both generate responses 4. Consensus algorithm picks best response 5. Result returned to client ## Benefits When Complete - **More workers**: Combine instances across machines - **Better consensus**: More responses = better selection - **Load balancing**: Distribute generation across devices - **Redundancy**: If one fails, others continue - **Heterogeneous hardware**: Mix Macs, PCs, servers ## Current Workaround Until federation is integrated, you can: 1. Run instances independently on different machines 2. Point clients to specific instances manually 3. No automatic peer discovery or coordination