# TODO: CUDA and Android Support in Federation ## Overview This document tracks known issues and recommendations for adding CUDA (NVIDIA) and Android nodes to the local_swarm federation system. ## Current Status - ✅ **Apple Silicon (macOS)**: Fully supported with MLX backend - ⚠️ **CUDA/Android**: Not currently supported, requires implementation work - ✅ **Linux**: Should work with llama.cpp + CUDA - ✅ **Windows**: Should work with llama.cpp + CUDA (not tested) ## Known Issues ### 1. No CUDA Backend for macOS **Problem:** - `__init__.py` only chooses MLX or llama.cpp - No CUDA path for macOS - Apple Silicon only supports Metal acceleration, not CUDA **Impact:** - CUDA/Android nodes on macOS cannot use GPU acceleration - These nodes will fall back to CPU-only mode **References:** - `src/backends/__init__.py` (lines 26-32) - `src/hardware/detector.py` (Apple Silicon detection) **Recommendation:** - Current architecture is correct for macOS - CUDA is not supported on Apple Silicon - Would need separate CUDA backend implementation (not recommended) --- ### 2. Platform Detection in `hardware/detector.py` **Current Detection:** ```python def detect_gpu(): # macOS: Apple Silicon (Metal only, no CUDA) # Linux/Windows: NVIDIA/AMD/Intel GPU (potential CUDA) # Android/Termux: CPU-only (no GPU) ``` **Impact:** - Android/Termux devices detected as Linux - Will use CPU-only mode (expected) - No special handling for Android platform **Potential Issue:** - Termux on Android reports as "linux" - May have different requirements (file paths, permissions) - Need to test if file paths work correctly on Android **References:** - `src/hardware/detector.py:170-221` (Android/Termux detection via `is_termux()`) **Recommendation:** - Add explicit Android platform detection beyond `is_termux()` - Test file path handling on Termux - Consider Android's unique file system limitations --- ### 3. Llama.cpp Backend Configuration **Current GPU Layer Logic:** ```python # src/backends/__init__.py (line 35) if hardware.gpu and not hardware.is_apple_silicon: n_gpu_layers = -1 # Offload all to GPU (Metal/CUDA) else: n_gpu_layers = 0 # CPU-only ``` **For CUDA Support on Linux:** - Should set `n_gpu_layers` based on actual GPU count - NVIDIA: Set to GPU count (1-8 for multi-GPU) - AMD ROCm: Different backend, not tested **Impact:** - Currently hardcoded to -1 on Apple Silicon (Metal) - CUDA nodes on Linux need proper layer configuration - No validation that requested layers match available GPU **References:** - `src/backends/llamacpp.py` (line 16, n_gpu_layers parameter) - `src/backends/__init__.py` (line 35) **Recommendation:** - Make `n_gpu_layers` configurable per backend - Auto-detect GPU capabilities from `pynvml` or system - Add GPU layer validation --- ### 4. Seed Variation Mode (Not an Issue, but Important) **Current Behavior:** ```python # src/swarm/manager.py (line 76-82) if use_seed_variation is None and hardware.is_apple_silicon: self.use_seed_variation = True # Auto-enabled on macOS ``` **How It Works:** - Runs 1 model instance with different random seeds - Simulates multiple "workers" for consensus - Saves memory by not loading multiple models **Impact on Federation:** - Your Mac: 1 worker → 2 votes (from 2 seeds) - Peer Mac: 2 workers → 2 votes (from 2 seeds) - Total: 4 votes instead of 8 (if using 4 actual instances) **This is CORRECT behavior** for seed variation mode. **Recommendation:** - To get 4 votes per machine (8 total), use `--instances 4` flag - Seed variation is a design choice, not a bug --- ### 5. Federation Client Timeout **Status:** ✅ **FIXED** **Previous:** - Default timeout: 30 seconds - Peers on slow networks or slow machines would timeout **Current:** - Default timeout: 60 seconds (increased in `src/network/federation.py:38`) - Gives peers more time to respond **References:** - `src/network/federation.py` (line 38) **Recommendation:** - Current 60s is reasonable - Consider making timeout configurable per peer in discovery - Add retry logic for failed requests --- ### 6. Network Discovery **Current Implementation:** ✅ **PLATFORM AGNOSTIC** **Uses:** - mDNS/Bonjour for peer discovery - Standard network protocols - No platform-specific blocking **Status:** Should work on all platforms (macOS, Linux, Windows, Android) **References:** - `src/network/discovery.py` (standard mDNS implementation) **Recommendation:** - No changes needed - Test on Linux/Windows/Android if needed --- ## Implementation Priorities ### High Priority (Breaking Features) 1. **CUDA Backend for Linux** (if needed) - Add CUDA-specific backend or extend llama.cpp - Auto-detect NVIDIA GPU and configure layers - Test on actual CUDA hardware - **Effort:** 3-5 days 2. **Android Platform Detection** - Add explicit Android detection beyond Termux - Handle Android's file system and package manager differences - Test on real Android device - **Effort:** 2-3 days ### Medium Priority (Improvements) 1. **GPU Layer Auto-Configuration** - Auto-detect GPU capabilities from system - Match requested layers to available hardware - Add validation and helpful error messages - **Effort:** 1-2 days 2. **Federation Metrics** - Add per-peer timeout in PeerInfo - Track latency and success rates - Better error handling for retry logic - **Effort:** 1 day ### Low Priority (Nice to Have) 1. **GPU Backend Selection UI** - Allow users to manually select MLX vs llama.cpp - Add warning for CUDA backend on macOS (not supported) - **Effort:** 2 hours 2. **Seed Variation Toggle** - Add command-line flag to disable seed variation - Document the trade-offs clearly - **Effort:** 30 minutes ## Testing Checklist Before marking any issue as complete, test on: ### macOS (Apple Silicon) - [ ] Federation with macOS peers (current environment) - [ ] Seed variation mode works correctly - [ ] MLX backend loads and generates - [ ] No crashes with multiple instances ### Linux (NVIDIA GPU) - [ ] llama.cpp backend loads with CUDA support - [ ] Federation with Linux peers works - [ ] GPU layers configured correctly - [ ] No GPU conflicts ### Windows (NVIDIA GPU) - [ ] llama.cpp backend loads with CUDA support - [ ] Federation with Windows peers works - [ ] No GPU conflicts ### Android (CPU-only) - [ ] Federation with Android peers works (mDNS should work) - [ ] CPU-only generation works - [ ] File paths work on Termux/Android ## Notes ### Architecture Decisions **Why not per-platform backends:** - Simplifies codebase (single MLX path, single llama.cpp path) - Reduces maintenance burden - Trade-off: Can't optimize for platform-specific GPUs in backends **Why seed variation on macOS:** - Apple Silicon has unified memory, not discrete VRAM - Loading multiple models would consume too much RAM - Seed variation allows consensus quality with 1 model instance **CUDA/Android is not a bug:** - Current system is designed for Apple Silicon + llama.cpp - Adding CUDA support requires significant architecture work - Focus on federation quality for current platforms first ## Related Files - `src/backends/__init__.py` - Backend selection logic - `src/backends/mlx.py` - Apple Silicon MLX backend - `src/backends/llamacpp.py` - llama.cpp backend (supports CUDA) - `src/hardware/detector.py` - Platform and GPU detection - `src/network/federation.py` - Federation communication - `src/network/discovery.py` - Peer discovery via mDNS - `src/swarm/manager.py` - Swarm orchestration ## Conclusion The current federation implementation is **platform-agnostic** and should work on Linux/Windows with CUDA nodes. The main limitation is that macOS (Apple Silicon) only supports Metal/MLX, not CUDA. **For immediate use:** - Use `--instances 4` flag on each machine to get 4 votes per machine - Test federation between different platforms (macOS + Linux) - Android/Termux should work as-is (CPU-only mode) **For future work:** - Implement high-priority items if CUDA/Android support is needed - Add GPU layer auto-configuration for better hardware utilization