local_swarm/TODO.md

# TODO: CUDA and Android Support in Federation

## Overview

This document tracks known issues and recommendations for adding CUDA (NVIDIA) and Android nodes to the local_swarm federation system.

## Current Status

- ✅ **Apple Silicon (macOS)**: Fully supported with MLX backend
- ⚠️ **CUDA/Android**: Not currently supported, requires implementation work
- ✅ **Linux**: Should work with llama.cpp + CUDA
- ✅ **Windows**: Should work with llama.cpp + CUDA (not tested)

## Known Issues

### 1. No CUDA Backend for macOS

**Problem:**
- `__init__.py` only chooses MLX or llama.cpp
- No CUDA path for macOS
- Apple Silicon only supports Metal acceleration, not CUDA

**Impact:**
- CUDA/Android nodes on macOS cannot use GPU acceleration
- These nodes will fall back to CPU-only mode

**References:**
- `src/backends/__init__.py` (lines 26-32)
- `src/hardware/detector.py` (Apple Silicon detection)

**Recommendation:**
- Current architecture is correct for macOS - CUDA is not supported on Apple Silicon
- Would need separate CUDA backend implementation (not recommended)

---

### 2. Platform Detection in `hardware/detector.py`

**Current Detection:**
```python
def detect_gpu():
    # macOS: Apple Silicon (Metal only, no CUDA)
    # Linux/Windows: NVIDIA/AMD/Intel GPU (potential CUDA)
    # Android/Termux: CPU-only (no GPU)
```

**Impact:**
- Android/Termux devices detected as Linux
- Will use CPU-only mode (expected)
- No special handling for Android platform

**Potential Issue:**
- Termux on Android reports as "linux"
- May have different requirements (file paths, permissions)
- Need to test if file paths work correctly on Android

**References:**
- `src/hardware/detector.py:170-221` (Android/Termux detection via `is_termux()`)

**Recommendation:**
- Add explicit Android platform detection beyond `is_termux()`
- Test file path handling on Termux
- Consider Android's unique file system limitations

---

### 3. Llama.cpp Backend Configuration

**Current GPU Layer Logic:**
```python
# src/backends/__init__.py (line 35)
if hardware.gpu and not hardware.is_apple_silicon:
    n_gpu_layers = -1  # Offload all to GPU (Metal/CUDA)
else:
    n_gpu_layers = 0  # CPU-only
```

**For CUDA Support on Linux:**
- Should set `n_gpu_layers` based on actual GPU count
- NVIDIA: Set to GPU count (1-8 for multi-GPU)
- AMD ROCm: Different backend, not tested

**Impact:**
- Currently hardcoded to -1 on Apple Silicon (Metal)
- CUDA nodes on Linux need proper layer configuration
- No validation that requested layers match available GPU

**References:**
- `src/backends/llamacpp.py` (line 16, n_gpu_layers parameter)
- `src/backends/__init__.py` (line 35)

**Recommendation:**
- Make `n_gpu_layers` configurable per backend
- Auto-detect GPU capabilities from `pynvml` or system
- Add GPU layer validation

---

### 4. Seed Variation Mode (Not an Issue, but Important)

**Current Behavior:**
```python
# src/swarm/manager.py (line 76-82)
if use_seed_variation is None and hardware.is_apple_silicon:
    self.use_seed_variation = True  # Auto-enabled on macOS
```

**How It Works:**
- Runs 1 model instance with different random seeds
- Simulates multiple "workers" for consensus
- Saves memory by not loading multiple models

**Impact on Federation:**
- Your Mac: 1 worker → 2 votes (from 2 seeds)
- Peer Mac: 2 workers → 2 votes (from 2 seeds)
- Total: 4 votes instead of 8 (if using 4 actual instances)

**This is CORRECT behavior** for seed variation mode.

**Recommendation:**
- To get 4 votes per machine (8 total), use `--instances 4` flag
- Seed variation is a design choice, not a bug

---

### 5. Federation Client Timeout

**Status:** ✅ **FIXED**

**Previous:**
- Default timeout: 30 seconds
- Peers on slow networks or slow machines would timeout

**Current:**
- Default timeout: 60 seconds (increased in `src/network/federation.py:38`)
- Gives peers more time to respond

**References:**
- `src/network/federation.py` (line 38)

**Recommendation:**
- Current 60s is reasonable
- Consider making timeout configurable per peer in discovery
- Add retry logic for failed requests

---

### 6. Network Discovery

**Current Implementation:** ✅ **PLATFORM AGNOSTIC**

**Uses:**
- mDNS/Bonjour for peer discovery
- Standard network protocols
- No platform-specific blocking

**Status:** Should work on all platforms (macOS, Linux, Windows, Android)

**References:**
- `src/network/discovery.py` (standard mDNS implementation)

**Recommendation:**
- No changes needed
- Test on Linux/Windows/Android if needed

---

## Implementation Priorities

### High Priority (Breaking Features)

1. **CUDA Backend for Linux** (if needed)
   - Add CUDA-specific backend or extend llama.cpp
   - Auto-detect NVIDIA GPU and configure layers
   - Test on actual CUDA hardware
   - **Effort:** 3-5 days

2. **Android Platform Detection**
   - Add explicit Android detection beyond Termux
   - Handle Android's file system and package manager differences
   - Test on real Android device
   - **Effort:** 2-3 days

### Medium Priority (Improvements)

1. **GPU Layer Auto-Configuration**
   - Auto-detect GPU capabilities from system
   - Match requested layers to available hardware
   - Add validation and helpful error messages
   - **Effort:** 1-2 days

2. **Federation Metrics**
   - Add per-peer timeout in PeerInfo
   - Track latency and success rates
   - Better error handling for retry logic
   - **Effort:** 1 day

### Low Priority (Nice to Have)

1. **GPU Backend Selection UI**
   - Allow users to manually select MLX vs llama.cpp
   - Add warning for CUDA backend on macOS (not supported)
   - **Effort:** 2 hours

2. **Seed Variation Toggle**
   - Add command-line flag to disable seed variation
   - Document the trade-offs clearly
   - **Effort:** 30 minutes

## Testing Checklist

Before marking any issue as complete, test on:

### macOS (Apple Silicon)
- [ ] Federation with macOS peers (current environment)
- [ ] Seed variation mode works correctly
- [ ] MLX backend loads and generates
- [ ] No crashes with multiple instances

### Linux (NVIDIA GPU)
- [ ] llama.cpp backend loads with CUDA support
- [ ] Federation with Linux peers works
- [ ] GPU layers configured correctly
- [ ] No GPU conflicts

### Windows (NVIDIA GPU)
- [ ] llama.cpp backend loads with CUDA support
- [ ] Federation with Windows peers works
- [ ] No GPU conflicts

### Android (CPU-only)
- [ ] Federation with Android peers works (mDNS should work)
- [ ] CPU-only generation works
- [ ] File paths work on Termux/Android

## Notes

### Architecture Decisions

**Why not per-platform backends:**
- Simplifies codebase (single MLX path, single llama.cpp path)
- Reduces maintenance burden
- Trade-off: Can't optimize for platform-specific GPUs in backends

**Why seed variation on macOS:**
- Apple Silicon has unified memory, not discrete VRAM
- Loading multiple models would consume too much RAM
- Seed variation allows consensus quality with 1 model instance

**CUDA/Android is not a bug:**
- Current system is designed for Apple Silicon + llama.cpp
- Adding CUDA support requires significant architecture work
- Focus on federation quality for current platforms first

## Related Files

- `src/backends/__init__.py` - Backend selection logic
- `src/backends/mlx.py` - Apple Silicon MLX backend
- `src/backends/llamacpp.py` - llama.cpp backend (supports CUDA)
- `src/hardware/detector.py` - Platform and GPU detection
- `src/network/federation.py` - Federation communication
- `src/network/discovery.py` - Peer discovery via mDNS
- `src/swarm/manager.py` - Swarm orchestration

## Conclusion

The current federation implementation is **platform-agnostic** and should work on Linux/Windows with CUDA nodes. The main limitation is that macOS (Apple Silicon) only supports Metal/MLX, not CUDA.

**For immediate use:**
- Use `--instances 4` flag on each machine to get 4 votes per machine
- Test federation between different platforms (macOS + Linux)
- Android/Termux should work as-is (CPU-only mode)

**For future work:**
- Implement high-priority items if CUDA/Android support is needed
- Add GPU layer auto-configuration for better hardware utilization