Files
local_swarm/NETWORK.md
T
sleepy 429a3a4e3b Add NETWORK.md documenting federation status and TODO
Documents the current state of network federation:
- What's working (discovery, federation client, network binding)
- What's missing (integration in main.py)
- Relevant files and functions
- Scope and limitations
- Comprehensive TODO list for implementation

Federation exists but isn't wired up to the main application flow.
2026-02-24 04:14:18 +01:00

6.9 KiB

Network Federation Status

Overview

Local Swarm has a federation system designed to allow multiple instances to collaborate on the same network, enabling distributed consensus and load balancing across multiple machines.

Current Implementation Status

What's Working

1. Network Discovery (src/network/discovery.py)

Purpose: Automatic discovery of other Local Swarm instances on the local network using mDNS/Bonjour.

Key Components:

  • SwarmDiscovery class - Main discovery service
  • PeerInfo dataclass - Stores information about peer swarms
  • start_advertising() - Announces this swarm to the network
  • start_discovery() - Listens for other swarms on the network
  • create_discovery_service() - Factory function to create discovery instance

How It Works:

  • Uses mDNS service type: _local-swarm._tcp.local.
  • Advertises on port 63323 (discovery) + API port (17615)
  • Broadcasts: version, instances, model_id, hardware_summary
  • Peers timeout after 60 seconds if not seen

2. Federation Client (src/network/federation.py)

Purpose: Communication protocol between peer swarms.

Key Components:

  • FederationClient class - HTTP client for peer communication
  • FederatedSwarm class - Wraps local swarm with federation logic
  • request_vote() - Gets generation results from peers
  • generate_with_federation() - Coordinates distributed generation
  • Federation strategies: best_of_n, weighted_vote, first_valid

API Endpoints (not yet exposed):

  • POST /v1/federation/vote - Request generation from peer
  • GET /v1/federation/health - Check peer health

3. Network Binding (main.py)

Purpose: Secure local network access without internet exposure.

Implementation:

  • get_local_ip() - Detects local network IP (192.x.x.x or 100.x.x.x)
  • Binds to specific local IP instead of 0.0.0.0
  • Falls back to localhost if not on private network

What's Missing

Critical Gap: No Integration

The federation system exists as standalone modules but is NOT connected to the main application flow.

Specific Issues:

  1. No CLI Flag: No --federation or --enable-federation argument in main.py

  2. Discovery Never Starts:

    • SwarmDiscovery class is imported in network/__init__.py
    • But never instantiated or started in main.py
    • start_advertising() and start_discovery() are never called
  3. Federation Never Starts:

    • FederatedSwarm class exists but is never instantiated
    • main.py calls swarm.generate() directly
    • Should call federated_swarm.generate_with_federation() when enabled
  4. API Routes Not Registered:

    • Federation endpoints exist in federation.py but aren't added to FastAPI router
    • Routes in src/api/routes.py don't include /v1/federation/*
  5. No Peer Management UI:

    • No way to see discovered peers
    • No status dashboard for federation
    • No manual peer configuration

File Structure

src/network/
├── __init__.py           # Exports SwarmDiscovery, FederationClient, etc.
├── discovery.py          # mDNS/Bonjour discovery service
│   ├── SwarmDiscovery    # Main discovery class
│   ├── PeerInfo          # Peer information dataclass
│   └── create_discovery_service()  # Factory function
├── federation.py         # Inter-swarm communication
│   ├── FederationClient  # HTTP client for peers
│   ├── FederatedSwarm    # Wraps swarm with federation
│   ├── PeerVote          # Vote from peer
│   └── FederationResult  # Result of federated generation
└── (routes missing)      # Should add federation routes

main.py                   # Should integrate federation here
  └── Currently: Just runs local swarm
  └── Should: Optionally run federated swarm with discovery

Scope

In Scope

  • Automatic discovery of peers on same local network
  • Distributed generation across multiple machines
  • Consensus voting between local and peer responses
  • Health checking and peer timeout handling
  • Secure local network binding (no internet exposure)

Out of Scope (Future)

  • Internet-wide federation (would need authentication/encryption)
  • Cross-platform federation (Mac ↔ Linux ↔ Windows)
  • Peer authentication/authorization
  • Encrypted peer communication
  • WAN federation through NAT traversal
  • Peer reputation/scoring system

TODO

Phase 1: Basic Integration (Minimum Viable)

  1. Add --federation CLI flag to main.py

    • Add argument parser entry
    • Conditionally enable federation
  2. Integrate discovery in main flow

    # In main.py after swarm initialization:
    if args.federation:
        discovery = await create_discovery_service(args.port)
        await discovery.start_advertising(swarm_info)
        await discovery.start_discovery()
    
  3. Add federation API routes to src/api/routes.py

    • POST /v1/federation/vote
    • GET /v1/federation/health
    • GET /v1/federation/peers (list discovered peers)
  4. Create FederatedSwarm wrapper

    # Replace: result = await swarm.generate(...)
    # With:
    if args.federation:
        federated = FederatedSwarm(swarm, discovery)
        result = await federated.generate_with_federation(...)
    else:
        result = await swarm.generate(...)
    

Phase 2: Polish

  1. Add peer status display

    • Show discovered peers in startup banner
    • Display peer count in status
    • Log when peers join/leave
  2. Handle edge cases

    • No peers available (fallback to local only)
    • All peers timeout (graceful degradation)
    • Split-brain scenarios
  3. Configuration

    • Config file support for federation settings
    • Manual peer list (bypass discovery)
    • Federation strategy selection

Phase 3: Testing

  1. Integration tests
    • Two instances on same machine
    • Two instances on same network
    • Peer timeout handling
    • Consensus validation

Usage (When Complete)

Start Federated Mode

# On Mac 1 (192.168.1.100)
python main.py --auto --federation

# On Mac 2 (192.168.1.101)
python main.py --auto --federation

# Both will:
# 1. Start local API on 192.168.x.x:17615
# 2. Advertise via mDNS
# 3. Discover each other within 5-10 seconds
# 4. Distribute generation requests between them

Expected Behavior

  1. Both Macs advertise themselves via mDNS
  2. Each discovers the other within 10 seconds
  3. When a request comes in, both generate responses
  4. Consensus algorithm picks best response
  5. Result returned to client

Benefits When Complete

  • More workers: Combine instances across machines
  • Better consensus: More responses = better selection
  • Load balancing: Distribute generation across devices
  • Redundancy: If one fails, others continue
  • Heterogeneous hardware: Mix Macs, PCs, servers

Current Workaround

Until federation is integrated, you can:

  1. Run instances independently on different machines
  2. Point clients to specific instances manually
  3. No automatic peer discovery or coordination