diff --git a/NETWORK.md b/NETWORK.md new file mode 100644 index 0000000..d8f6214 --- /dev/null +++ b/NETWORK.md @@ -0,0 +1,204 @@ +# Network Federation Status + +## Overview +Local Swarm has a federation system designed to allow multiple instances to collaborate on the same network, enabling distributed consensus and load balancing across multiple machines. + +## Current Implementation Status + +### ✅ What's Working + +#### 1. Network Discovery (`src/network/discovery.py`) +**Purpose**: Automatic discovery of other Local Swarm instances on the local network using mDNS/Bonjour. + +**Key Components**: +- `SwarmDiscovery` class - Main discovery service +- `PeerInfo` dataclass - Stores information about peer swarms +- `start_advertising()` - Announces this swarm to the network +- `start_discovery()` - Listens for other swarms on the network +- `create_discovery_service()` - Factory function to create discovery instance + +**How It Works**: +- Uses mDNS service type: `_local-swarm._tcp.local.` +- Advertises on port 63323 (discovery) + API port (17615) +- Broadcasts: version, instances, model_id, hardware_summary +- Peers timeout after 60 seconds if not seen + +#### 2. Federation Client (`src/network/federation.py`) +**Purpose**: Communication protocol between peer swarms. + +**Key Components**: +- `FederationClient` class - HTTP client for peer communication +- `FederatedSwarm` class - Wraps local swarm with federation logic +- `request_vote()` - Gets generation results from peers +- `generate_with_federation()` - Coordinates distributed generation +- Federation strategies: `best_of_n`, `weighted_vote`, `first_valid` + +**API Endpoints** (not yet exposed): +- `POST /v1/federation/vote` - Request generation from peer +- `GET /v1/federation/health` - Check peer health + +#### 3. Network Binding (`main.py`) +**Purpose**: Secure local network access without internet exposure. + +**Implementation**: +- `get_local_ip()` - Detects local network IP (192.x.x.x or 100.x.x.x) +- Binds to specific local IP instead of 0.0.0.0 +- Falls back to localhost if not on private network + +## ❌ What's Missing + +### Critical Gap: No Integration +**The federation system exists as standalone modules but is NOT connected to the main application flow.** + +**Specific Issues**: + +1. **No CLI Flag**: No `--federation` or `--enable-federation` argument in `main.py` + +2. **Discovery Never Starts**: + - `SwarmDiscovery` class is imported in `network/__init__.py` + - But never instantiated or started in `main.py` + - `start_advertising()` and `start_discovery()` are never called + +3. **Federation Never Starts**: + - `FederatedSwarm` class exists but is never instantiated + - `main.py` calls `swarm.generate()` directly + - Should call `federated_swarm.generate_with_federation()` when enabled + +4. **API Routes Not Registered**: + - Federation endpoints exist in `federation.py` but aren't added to FastAPI router + - Routes in `src/api/routes.py` don't include `/v1/federation/*` + +5. **No Peer Management UI**: + - No way to see discovered peers + - No status dashboard for federation + - No manual peer configuration + +## File Structure + +``` +src/network/ +├── __init__.py # Exports SwarmDiscovery, FederationClient, etc. +├── discovery.py # mDNS/Bonjour discovery service +│ ├── SwarmDiscovery # Main discovery class +│ ├── PeerInfo # Peer information dataclass +│ └── create_discovery_service() # Factory function +├── federation.py # Inter-swarm communication +│ ├── FederationClient # HTTP client for peers +│ ├── FederatedSwarm # Wraps swarm with federation +│ ├── PeerVote # Vote from peer +│ └── FederationResult # Result of federated generation +└── (routes missing) # Should add federation routes + +main.py # Should integrate federation here + └── Currently: Just runs local swarm + └── Should: Optionally run federated swarm with discovery +``` + +## Scope + +### In Scope +- Automatic discovery of peers on same local network +- Distributed generation across multiple machines +- Consensus voting between local and peer responses +- Health checking and peer timeout handling +- Secure local network binding (no internet exposure) + +### Out of Scope (Future) +- Internet-wide federation (would need authentication/encryption) +- Cross-platform federation (Mac ↔ Linux ↔ Windows) +- Peer authentication/authorization +- Encrypted peer communication +- WAN federation through NAT traversal +- Peer reputation/scoring system + +## TODO + +### Phase 1: Basic Integration (Minimum Viable) +1. **Add `--federation` CLI flag** to `main.py` + - Add argument parser entry + - Conditionally enable federation + +2. **Integrate discovery in main flow** + ```python + # In main.py after swarm initialization: + if args.federation: + discovery = await create_discovery_service(args.port) + await discovery.start_advertising(swarm_info) + await discovery.start_discovery() + ``` + +3. **Add federation API routes** to `src/api/routes.py` + - `POST /v1/federation/vote` + - `GET /v1/federation/health` + - `GET /v1/federation/peers` (list discovered peers) + +4. **Create FederatedSwarm wrapper** + ```python + # Replace: result = await swarm.generate(...) + # With: + if args.federation: + federated = FederatedSwarm(swarm, discovery) + result = await federated.generate_with_federation(...) + else: + result = await swarm.generate(...) + ``` + +### Phase 2: Polish +5. **Add peer status display** + - Show discovered peers in startup banner + - Display peer count in status + - Log when peers join/leave + +6. **Handle edge cases** + - No peers available (fallback to local only) + - All peers timeout (graceful degradation) + - Split-brain scenarios + +7. **Configuration** + - Config file support for federation settings + - Manual peer list (bypass discovery) + - Federation strategy selection + +### Phase 3: Testing +8. **Integration tests** + - Two instances on same machine + - Two instances on same network + - Peer timeout handling + - Consensus validation + +## Usage (When Complete) + +### Start Federated Mode +```bash +# On Mac 1 (192.168.1.100) +python main.py --auto --federation + +# On Mac 2 (192.168.1.101) +python main.py --auto --federation + +# Both will: +# 1. Start local API on 192.168.x.x:17615 +# 2. Advertise via mDNS +# 3. Discover each other within 5-10 seconds +# 4. Distribute generation requests between them +``` + +### Expected Behavior +1. Both Macs advertise themselves via mDNS +2. Each discovers the other within 10 seconds +3. When a request comes in, both generate responses +4. Consensus algorithm picks best response +5. Result returned to client + +## Benefits When Complete +- **More workers**: Combine instances across machines +- **Better consensus**: More responses = better selection +- **Load balancing**: Distribute generation across devices +- **Redundancy**: If one fails, others continue +- **Heterogeneous hardware**: Mix Macs, PCs, servers + +## Current Workaround +Until federation is integrated, you can: +1. Run instances independently on different machines +2. Point clients to specific instances manually +3. No automatic peer discovery or coordination