Documents the current state of network federation: - What's working (discovery, federation client, network binding) - What's missing (integration in main.py) - Relevant files and functions - Scope and limitations - Comprehensive TODO list for implementation Federation exists but isn't wired up to the main application flow.
6.9 KiB
Network Federation Status
Overview
Local Swarm has a federation system designed to allow multiple instances to collaborate on the same network, enabling distributed consensus and load balancing across multiple machines.
Current Implementation Status
✅ What's Working
1. Network Discovery (src/network/discovery.py)
Purpose: Automatic discovery of other Local Swarm instances on the local network using mDNS/Bonjour.
Key Components:
SwarmDiscoveryclass - Main discovery servicePeerInfodataclass - Stores information about peer swarmsstart_advertising()- Announces this swarm to the networkstart_discovery()- Listens for other swarms on the networkcreate_discovery_service()- Factory function to create discovery instance
How It Works:
- Uses mDNS service type:
_local-swarm._tcp.local. - Advertises on port 63323 (discovery) + API port (17615)
- Broadcasts: version, instances, model_id, hardware_summary
- Peers timeout after 60 seconds if not seen
2. Federation Client (src/network/federation.py)
Purpose: Communication protocol between peer swarms.
Key Components:
FederationClientclass - HTTP client for peer communicationFederatedSwarmclass - Wraps local swarm with federation logicrequest_vote()- Gets generation results from peersgenerate_with_federation()- Coordinates distributed generation- Federation strategies:
best_of_n,weighted_vote,first_valid
API Endpoints (not yet exposed):
POST /v1/federation/vote- Request generation from peerGET /v1/federation/health- Check peer health
3. Network Binding (main.py)
Purpose: Secure local network access without internet exposure.
Implementation:
get_local_ip()- Detects local network IP (192.x.x.x or 100.x.x.x)- Binds to specific local IP instead of 0.0.0.0
- Falls back to localhost if not on private network
❌ What's Missing
Critical Gap: No Integration
The federation system exists as standalone modules but is NOT connected to the main application flow.
Specific Issues:
-
No CLI Flag: No
--federationor--enable-federationargument inmain.py -
Discovery Never Starts:
SwarmDiscoveryclass is imported innetwork/__init__.py- But never instantiated or started in
main.py start_advertising()andstart_discovery()are never called
-
Federation Never Starts:
FederatedSwarmclass exists but is never instantiatedmain.pycallsswarm.generate()directly- Should call
federated_swarm.generate_with_federation()when enabled
-
API Routes Not Registered:
- Federation endpoints exist in
federation.pybut aren't added to FastAPI router - Routes in
src/api/routes.pydon't include/v1/federation/*
- Federation endpoints exist in
-
No Peer Management UI:
- No way to see discovered peers
- No status dashboard for federation
- No manual peer configuration
File Structure
src/network/
├── __init__.py # Exports SwarmDiscovery, FederationClient, etc.
├── discovery.py # mDNS/Bonjour discovery service
│ ├── SwarmDiscovery # Main discovery class
│ ├── PeerInfo # Peer information dataclass
│ └── create_discovery_service() # Factory function
├── federation.py # Inter-swarm communication
│ ├── FederationClient # HTTP client for peers
│ ├── FederatedSwarm # Wraps swarm with federation
│ ├── PeerVote # Vote from peer
│ └── FederationResult # Result of federated generation
└── (routes missing) # Should add federation routes
main.py # Should integrate federation here
└── Currently: Just runs local swarm
└── Should: Optionally run federated swarm with discovery
Scope
In Scope
- Automatic discovery of peers on same local network
- Distributed generation across multiple machines
- Consensus voting between local and peer responses
- Health checking and peer timeout handling
- Secure local network binding (no internet exposure)
Out of Scope (Future)
- Internet-wide federation (would need authentication/encryption)
- Cross-platform federation (Mac ↔ Linux ↔ Windows)
- Peer authentication/authorization
- Encrypted peer communication
- WAN federation through NAT traversal
- Peer reputation/scoring system
TODO
Phase 1: Basic Integration (Minimum Viable)
-
Add
--federationCLI flag tomain.py- Add argument parser entry
- Conditionally enable federation
-
Integrate discovery in main flow
# In main.py after swarm initialization: if args.federation: discovery = await create_discovery_service(args.port) await discovery.start_advertising(swarm_info) await discovery.start_discovery() -
Add federation API routes to
src/api/routes.pyPOST /v1/federation/voteGET /v1/federation/healthGET /v1/federation/peers(list discovered peers)
-
Create FederatedSwarm wrapper
# Replace: result = await swarm.generate(...) # With: if args.federation: federated = FederatedSwarm(swarm, discovery) result = await federated.generate_with_federation(...) else: result = await swarm.generate(...)
Phase 2: Polish
-
Add peer status display
- Show discovered peers in startup banner
- Display peer count in status
- Log when peers join/leave
-
Handle edge cases
- No peers available (fallback to local only)
- All peers timeout (graceful degradation)
- Split-brain scenarios
-
Configuration
- Config file support for federation settings
- Manual peer list (bypass discovery)
- Federation strategy selection
Phase 3: Testing
- Integration tests
- Two instances on same machine
- Two instances on same network
- Peer timeout handling
- Consensus validation
Usage (When Complete)
Start Federated Mode
# On Mac 1 (192.168.1.100)
python main.py --auto --federation
# On Mac 2 (192.168.1.101)
python main.py --auto --federation
# Both will:
# 1. Start local API on 192.168.x.x:17615
# 2. Advertise via mDNS
# 3. Discover each other within 5-10 seconds
# 4. Distribute generation requests between them
Expected Behavior
- Both Macs advertise themselves via mDNS
- Each discovers the other within 10 seconds
- When a request comes in, both generate responses
- Consensus algorithm picks best response
- Result returned to client
Benefits When Complete
- More workers: Combine instances across machines
- Better consensus: More responses = better selection
- Load balancing: Distribute generation across devices
- Redundancy: If one fails, others continue
- Heterogeneous hardware: Mix Macs, PCs, servers
Current Workaround
Until federation is integrated, you can:
- Run instances independently on different machines
- Point clients to specific instances manually
- No automatic peer discovery or coordination