133 Commits

Author SHA1 Message Date
sleepy 2461f45ca8 fix: Remove slow HF API check from recommended config selection
- select_optimal_model was checking HF API for available quantizations
- This caused menu to hang/slow down when changing context
- Now only checks availability when browsing or custom config
- Recommended config uses default quantizations (faster)
2026-02-23 23:54:57 +01:00
sleepy f2d0fddfa4 fix: Update selector to check available quantizations on Mac 2026-02-23 23:52:29 +01:00
sleepy cb8e05e627 feat: Check available quantizations on Mac before showing menu
- Update list_models() and build_models() to accept check_available parameter
- Update interactive.py to pass check_available=True on Mac
- Menu now filters out non-existent quantizations in real-time
- Users can only select quantizations that actually exist on HF

This prevents the issue where users select 4bit but the system
tries to download 5bit because only certain quants exist.
2026-02-23 23:52:06 +01:00
sleepy 8028df7150 feat: Filter MLX quantizations to only show available ones
- Add filter_available_mlx_quants() to check HuggingFace for existing repos
- Update build_model_variants() to optionally check availability
- Menu will now only show quantizations that actually exist
- Prevents users from selecting non-existent quantizations

Note: This adds a small delay when building models as it checks HF API,
but prevents download failures later.
2026-02-23 23:50:12 +01:00
sleepy e323d43d2b feat: Validate MLX models exist before download and suggest alternatives
- Add _validate_mlx_model_exists() to check HuggingFace repos
- Show warning when selected quantization doesn't exist
- List available quantizations for the model
- Better error messages with suggestions

This prevents trying to download non-existent quantizations like 5bit
when only 3bit, 4bit, 6bit, 8bit are available.
2026-02-23 23:48:53 +01:00
sleepy 792c40594e fix: Recommended config shows 3 responses on Mac instead of 1
Updated _try_model_with_context and _try_smallest_variant_with_context:
- On Mac (use_mlx=True): Returns 3 responses by default
- On other platforms: Still calculates based on VRAM
- Memory calculation fixed for Mac (doesn't multiply by response count)

Fixes issue where recommended config showed 'Responses: 1' on Mac
2026-02-23 23:46:01 +01:00
sleepy a4049f1c35 fix: Correct memory calculations for Mac seed variation mode
On Mac (Apple Silicon) with seed variation:
- Total memory no longer multiplied by number of responses
- Memory is shared across all responses (same model, different seeds)
- list_available_configurations: Uses 3 responses, single memory calculation
- custom_configuration: Memory doesn't scale with response count
- show_startup_summary: Shows '(shared)' for RAM on Mac
- All memory displays now accurate for seed variation mode
2026-02-23 23:42:47 +01:00
sleepy 884cb798a5 feat: UI shows 'responses' instead of 'instances' on Mac
- On Apple Silicon, UI terminology changed from 'instances' to 'responses'
- Mac default: 3 responses (configurable 2-5)
- Non-Mac: Still uses memory-based calculation
- Added explanation that seed variation keeps memory constant
- Menu and prompts updated to show appropriate terminology
2026-02-23 23:38:31 +01:00
sleepy 411295acba feat: Add seed variation and reviewer modes for Apple Silicon
- Add use_seed_variation mode: Generate multiple responses from one model
  with different random seeds (saves memory on Apple Silicon)
- Add enable_reviewer mode: A critic worker validates consensus results
  and triggers retries if output looks suspicious
- Add generate_with_seed_variation() method for single-model multi-response
- Add generate_with_reviewer() method with feedback loop
- Auto-enable seed variation on Apple Silicon to save memory
- Configurable max_retries for reviewer mode
2026-02-23 23:33:43 +01:00
sleepy 93f5788d74 feat: Add tool calling support to API
- Add Tool, ToolCall, FunctionDefinition models
- Format prompts with tool descriptions for Qwen models
- Parse tool calls from model output (JSON and function call patterns)
- Auto-disable streaming when tools are present
- Return tool_calls in API response with proper finish_reason
- Support both simple function calls and JSON tool_calls format
2026-02-23 23:08:47 +01:00
sleepy 472961cc23 feat: Apple Silicon MLX support, sequential workers, live status display, worker names
Major improvements for macOS/Apple Silicon:
- Add spawn-based multiprocessing for Metal GPU compatibility
- Implement sequential generation mode for multiple workers
- Each worker runs one-at-a-time to avoid GPU conflicts
- All workers stay loaded in memory for fast switching

User Experience:
- 100 unique worker names (Alpha, Raven, Zeus, etc.)
- Live terminal status display with progress bars
- Show context usage and last output per worker
- Display IP addresses for network workers

Configuration:
- Default port changed to 17615 (from 8000)
- Context size options: 16K, 32K (default), 64K, 128K
- Offloading options: none, 20%, 50%
- Default max_tokens: 1024

MLX Quantization Support:
- Support 3bit, 4bit, 5bit, 6bit, 8bit MLX models
- Proper memory calculations for each quantization
- Sequential mode automatically enabled on Apple Silicon

Bug Fixes:
- Fix instance calculation (was always returning 1)
- Fix quantization bit detection for MLX models
- Fix config.json generation in model folders
- Preload MiniLM embedding model during init

Files Changed:
- main.py: Spawn method for macOS, port 17615
- src/backends/mlx.py: MLX generation with stop sequences
- src/models/selector.py: Fix instance calculation
- src/swarm/manager.py: Sequential generation mode
- src/swarm/consensus.py: Preload embedding model
- src/swarm/worker.py: Progress tracking per worker
- src/swarm/worker_names.py: 100 unique names (NEW)
- src/swarm/status_monitor.py: Live display (NEW)
- src/interactive.py: Context/offload menus
- src/models/registry.py: MLX quantization sizes
- src/api/server.py: Port 17615, live status
2026-02-23 22:57:38 +01:00
sleepy cbcba954ae Add CONTEXT.md documentation
Document the context window discussion and design decisions:
- Industry approaches (MoE, Ensemble, Pipeline, Speculative)
- Memory offloading options and trade-offs
- Why KV cache can't be shared between workers
- Three architectural options for 30K-60K+ context
- Current implementation status
- Hardware-specific recommendations

Provides reference for future enhancements and helps users
understand memory constraints in swarm architectures.
2026-02-23 20:19:46 +01:00
sleepy e794fe29d4 Fix critical bugs, concurrency issues, and code quality across codebase
- Fix asyncio.create_task() crash in zeroconf background thread (discovery.py)
- Fix int(bytes) TypeError in peer property decoding (discovery.py)
- Fix unreachable Android/Qualcomm GPU detection path (detector.py)
- Add nvmlShutdown() to prevent NVML resource leak (detector.py)
- Wrap blocking inference in asyncio.to_thread() to unblock event loop (llamacpp.py, mlx.py)
- Initialize and use asyncio.Lock for concurrency safety (llamacpp.py)
- Fix VRAM regex matching GPU index instead of byte value (amd.py)
- Implement best_of_n federation strategy (was dead code) (federation.py)
- Lazy-import aiohttp/mcp to avoid hard ImportError (federation.py, mcp_server.py)
- Fix response_model conflict with streaming responses (routes.py)
- Fix CORS allow_origins=* with allow_credentials=True violation (server.py)
- Fix memory calculation using pre-clamped instance count (selector.py)
- Fix calculate_max_instances returning 2 when only 0-1 fit (selector.py)
- Atomic downloads via .part file to prevent caching partial files (downloader.py)
- Replace recursive menu navigation with loop-based approach (interactive.py)
- Implement actual majority voting in _majority_vote (consensus.py)
- Fix false-positive list detection in quality scoring (consensus.py)
- Replace 15+ bare except: with except Exception: across codebase
- Fix .json() -> .model_dump_json() for Pydantic v2 (routes.py)
- Remove unused MCP imports, add empty prompt validation (mcp_server.py)
- Use tokenizer for accurate MLX token counting (mlx.py)
- Fix memory estimate from FP32 (*4) to quantized (*0.6) (llamacpp.py)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 20:11:58 +01:00
sleepy d68eda45d8 Fix .gitignore to allow src/models/ directory
The .gitignore had 'models/' which excluded both:
- The models/ cache directory at root (intended)
- The src/models/ module directory (NOT intended)

Changed to '/models/' to only exclude root-level models/ directory
while allowing src/models/ to be tracked.

This fixes the 'No module named models' error on fresh clones.
2026-02-23 19:51:40 +01:00
sleepy 6e10438914 Fix Windows import path issue
Add more robust path resolution for Windows:
- Use Path.resolve() to get absolute path
- Also add parent directory to sys.path
- Fixes 'No module named models' error on Windows

Users can now run:
  python main.py --test

Or use the module approach:
  python -m local_swarm --test
2026-02-23 19:24:33 +01:00
sleepy 77f26f7381 Update PLAN.md and README with documentation completion
Update Phase 8.3 Documentation to mark as COMPLETED:
- Document all sections added to docs/GUIDE.md
- Update README.md with documentation links

Documentation now includes:
- Quick Start Guide for all platforms
- Opencode configuration examples
- API reference with examples
- Comprehensive troubleshooting
- Performance tuning guide
- Advanced configuration options
2026-02-23 18:40:35 +01:00
sleepy 1788087145 Add comprehensive documentation
Create docs/GUIDE.md with complete documentation:
- Quick Start Guide for all platforms
- Opencode configuration examples:
  - Basic configuration
  - Remote machine setup
  - Multiple model options
  - Environment-specific configs
  - Testing instructions
- API Reference:
  - All OpenAI-compatible endpoints
  - Federation endpoints
  - Request/response examples
- Troubleshooting Guide:
  - Common issues and solutions
  - Platform-specific problems
  - Installation issues
- Advanced Configuration:
  - config.yaml options
  - Environment variables
- Performance Tuning:
  - Speed vs quality settings
  - Memory usage tables
  - Recommended configurations
- MCP Server setup and usage
- Network Federation guide

Update README.md:
- Add Documentation section with links
- Reference the complete guide

Documentation now covers:
 Installation all platforms
 Opencode integration
 API usage
 Troubleshooting
 Performance optimization
 Advanced features
2026-02-23 18:39:56 +01:00
sleepy 08a5b800d0 Phase 7: Add AMD, Intel, and Qualcomm GPU support
Add src/hardware/amd.py:
- AMD GPU detection via ROCm (rocm-smi)
- Windows AMD detection via PowerShell/WMI
- Fallback to PCI detection on Linux
- VRAM parsing from ROCm output
- Driver version detection
- Supports Radeon RX series and other AMD GPUs

Add src/hardware/intel.py:
- Intel GPU detection via OneAPI (sycl-ls)
- OpenCL fallback detection
- Windows Intel detection via PowerShell
- Arc, Iris Xe, UHD graphics support
- VRAM estimation for discrete vs integrated
- Driver version detection

Add src/hardware/qualcomm.py:
- Qualcomm Snapdragon detection for Android/Termux
- Multi-method detection (cpuinfo, hardware, getprop)
- Termux environment detection
- Adreno GPU model extraction
- RAM-based VRAM estimation (25% of total)
- Setup requirements checking
- Device model name retrieval

Update src/hardware/detector.py:
- Add is_mobile flag to GPUInfo dataclass
- Update detect_gpu() to check all GPU vendors
- Priority: NVIDIA > AMD > Intel > Qualcomm
- Add detect_qualcomm() helper function

All detection modules support:
- Multiple detection methods with fallbacks
- Platform-specific implementations (Linux/Windows/Android)
- Graceful handling of missing tools/drivers
- Consistent GPUInfo return format

Phase 7 complete: Extended GPU support for AMD, Intel, and Qualcomm/Adreno GPUs.
2026-02-23 18:35:13 +01:00
sleepy 765f26cd49 Add comprehensive Tips & Help menu
Add new menu option [t] Tips & Help:
- Model Recommendations: Ranked list of best coding models
  - Qwen 2.5 Coder (best overall)
  - DeepSeek Coder (great alternative)
  - CodeLlama (solid choice)
  - Size recommendations (1-3B, 7B, 13B+)

- Quantization Guide: Simple explanation of Q4/Q5/Q6
  - What quantization is
  - Trade-offs between levels
  - File size comparison
  - When to use each level
  - Quick reference table

- Instance Count Tips: Research-based recommendations
  - Minimum 2 instances (required for consensus)
  - Sweet spot: 3-5 instances (85-90% of benefit)
  - Maximum 8 instances (diminishing returns)
  - Memory calculation examples
  - Research note on consensus effectiveness

- Hardware Optimization: Tips specific to user's setup
  - Apple Silicon (MLX backend tips)
  - Discrete GPU (CUDA/ROCm optimization)
  - CPU-only (practical limitations)
  - General speed vs quality trade-offs
  - Memory management best practices

All tips are shown in interactive format with clear sections,
practical advice, and hardware-specific recommendations based on
detected system specs.
2026-02-23 18:10:21 +01:00
sleepy 74bbca18bd Phase 6: Network Federation (#2)
* Add exit menu option

Add [q] Quit option to interactive menu:
- Allows user to exit without starting the swarm
- Shows 'Exiting...' message
- Returns None to gracefully exit main.py

* Phase 6: Implement network federation (WIP)

Add src/network/discovery.py:
- SwarmDiscovery class using mDNS/Bonjour
- PeerInfo dataclass for peer metadata
- Automatic peer discovery on local network
- Service advertising for this swarm
- Stale peer detection and cleanup

Add src/network/federation.py:
- FederationClient for HTTP communication with peers
- FederatedSwarm for managing cross-swarm consensus
- Two-phase voting: local consensus then peer voting
- Weighted voting strategy based on confidence
- Federation status monitoring
- Peer health checking

Add src/network/__init__.py:
- Export network classes

Update src/api/routes.py:
- POST /v1/federation/vote - Receive votes from peers
- GET /v1/federation/status - Get federation status
- GET /v1/federation/peers - List discovered peers

Update requirements.txt:
- Add zeroconf for mDNS discovery

Features:
- Auto-discovery of other Local Swarm instances
- Cross-swarm consensus voting
- Configurable minimum peer requirements
- Fallback to local-only if no peers available
- Peer health monitoring

TODO:
- Integrate federation into main.py
- Add --federation flag
- Test multi-machine setup
2026-02-23 18:06:43 +01:00
sleepy 2f547fe101 Phase 6: Network Federation (#1)
* Update PLAN.md with new phases

- Add Phase 5: CLI & Interactive Interface
  - Interactive menu system with 3 options
  - Hardware display with detailed specs
  - Resource usage monitoring
  - Custom configuration wizard

- Add Phase 5.5: MCP Server
  - MCP protocol implementation
  - 5 MCP tools for AI assistants
  - Dual server mode (HTTP + MCP)

- Reorganize phase structure for clarity

* Phase 6: Implement network federation (WIP)

Add src/network/discovery.py:
- SwarmDiscovery class using mDNS/Bonjour
- PeerInfo dataclass for peer metadata
- Automatic peer discovery on local network
- Service advertising for this swarm
- Stale peer detection and cleanup

Add src/network/federation.py:
- FederationClient for HTTP communication with peers
- FederatedSwarm for managing cross-swarm consensus
- Two-phase voting: local consensus then peer voting
- Weighted voting strategy based on confidence
- Federation status monitoring
- Peer health checking

Add src/network/__init__.py:
- Export network classes

Update src/api/routes.py:
- POST /v1/federation/vote - Receive votes from peers
- GET /v1/federation/status - Get federation status
- GET /v1/federation/peers - List discovered peers

Update requirements.txt:
- Add zeroconf for mDNS discovery

Features:
- Auto-discovery of other Local Swarm instances
- Cross-swarm consensus voting
- Configurable minimum peer requirements
- Fallback to local-only if no peers available
- Peer health monitoring

TODO:
- Integrate federation into main.py
- Add --federation flag
- Test multi-machine setup
2026-02-23 18:05:27 +01:00
sleepy 3ff988b9ba Fix bugs and add model update feature
Fix duplicate instances bug:
- Remove 'instances' from label in list_available_configurations()
- Now shows correctly as 'Model Size (quant)' with 'X instances' in description

Add more models to registry:
- Llama 3.2 (3B, 1B)
- Phi-4 (4B)
- Gemma 2 (2B, 4B, 9B)
- StarCoder2 (3B, 7B, 15B)
- Updated HF repo mappings and filename patterns

Add model update mechanism (src/models/updater.py):
- ModelUpdater class for querying HuggingFace Hub
- Queries trending GGUF models tagged with 'code'
- Filters out already-known models
- Estimates VRAM from model name
- 30-minute rate limiting between checks
- Saves custom models to ~/.local_swarm/custom_models.json
- Manual check only (no auto-update to avoid overloading HF)

Add menu option '4 - Check for New Models':
- Queries HF for trending models (respects rate limits)
- Displays model info (name, downloads, likes, est. VRAM)
- Allows adding models to custom registry
- Returns to model selection after

About etcd:
- Not needed for home networks
- mDNS (Bonjour) is simpler and requires no central server
- Perfect for 2-5 machine setups
- Zero configuration, auto-discovery

Changes to interactive.py:
- Added option 4 to main menu
- Added check_for_new_models_menu() function
- Displays trending models with metadata
- Allows manual addition to custom registry
2026-02-23 18:02:50 +01:00
sleepy ac8a90f2bf Update PLAN.md with new phases
- Add Phase 5: CLI & Interactive Interface
  - Interactive menu system with 3 options
  - Hardware display with detailed specs
  - Resource usage monitoring
  - Custom configuration wizard

- Add Phase 5.5: MCP Server
  - MCP protocol implementation
  - 5 MCP tools for AI assistants
  - Dual server mode (HTTP + MCP)

- Reorganize phase structure for clarity
2026-02-23 17:48:49 +01:00
sleepy b9669e415d Update README with interactive menu documentation
- Add interactive mode section with screenshots
- Document the 3 menu options (recommended, browse, custom)
- Add startup summary section showing what info is displayed
- Add interactive features and MCP server to features list
- Document --auto flag to skip menu
- Add hardware/resource usage display examples
2026-02-23 17:44:44 +01:00
sleepy 1e183bd4cc Add interactive menu system and startup summary
Add src/interactive.py:
- Interactive model selection menu with 3 options:
  1. Recommended Configuration (auto-detect best)
  2. Browse All Configurations (see all feasible models)
  3. Custom Configuration (user-specified model + instances)
- Hardware info display with detailed specs
- Resource usage monitoring showing:
  - Swarm status, model, workers
  - Memory usage per worker
  - Worker statistics (requests, latency, tokens/sec)
- Custom configuration wizard:
  - Select from available models
  - Choose model size (3B, 7B, 14B, etc.)
  - Pick quantization level (Q4, Q5, Q6)
  - Specify number of instances
- Runtime menu for monitoring (refresh/quit)

Update main.py:
- Default mode now shows interactive menu
- Add --auto flag to skip menu and use recommended config
- Show comprehensive startup summary with hardware + config + usage
- Better integration with interactive module
- Removed redundant print functions (now in interactive.py)

Features:
- Clear screen for clean menu display
- Formatted headers and sections
- Menu validation and error handling
- Memory utilization percentage display
- Real-time worker status with health indicators
2026-02-23 17:43:38 +01:00
sleepy d3d2c50c71 Update README with MCP server documentation
- Add MCP Server section explaining the --mcp flag
- Document the 5 MCP tools available to AI assistants
- Add --mcp to CLI Options section
- Explain benefits of MCP integration for automatic hardware queries
2026-02-23 17:38:48 +01:00
sleepy cc0ee08b6f Phase 5: Add MCP server support alongside HTTP API
Add src/mcp_server.py:
- LocalSwarmMCPServer class implementing MCP protocol
- 5 MCP tools exposed:
  - get_hardware_info: Check CPU, GPU, RAM
  - get_swarm_status: Get worker status and model info
  - generate_code: Generate with consensus voting
  - list_available_models: Show all runnable models
  - get_worker_details: Detailed worker statistics
- Integration with SwarmManager for code generation
- Stdio transport for AI assistant communication

Update requirements.txt:
- Add mcp>=1.0.0 dependency

Update main.py:
- Add --mcp flag to enable MCP server
- Run MCP server alongside HTTP API when enabled
- Both servers share the same SwarmManager instance
- Display MCP status in startup message

Now Local Swarm supports both:
- HTTP API (for external clients, curl, opencode)
- MCP server (for tight AI assistant integration)

Usage:
  python main.py              # HTTP API only
  python main.py --mcp        # HTTP API + MCP server

MCP tools allow AI assistants to:
- Query hardware capabilities before suggesting models
- Check swarm health and worker status
- Generate code with automatic consensus voting
- List available models for the hardware
2026-02-23 17:37:55 +01:00
sleepy 4367c79d83 Phase 4: Implement OpenAI-compatible API server
Add src/api/models.py:
- Pydantic models for OpenAI API compatibility
- ChatCompletionRequest/Response models
- Streaming response models (SSE format)
- Model listing and health check models

Add src/api/routes.py:
- POST /v1/chat/completions endpoint
- GET /v1/models endpoint
- GET /health and /v1/health endpoints
- Support for streaming (text/event-stream) and regular responses
- Message formatting for chat prompts
- Error handling with proper HTTP status codes

Add src/api/server.py:
- FastAPI application with CORS middleware
- Lifespan context for startup/shutdown
- Integration with SwarmManager
- Uvicorn server configuration

Update src/api/__init__.py:
- Export API classes and functions

Update main.py:
- Integrate API server into default workflow
- Start API server on http://127.0.0.1:PORT
- Show API endpoints and opencode configuration
- Graceful shutdown on Ctrl+C

Update AGENTS.md:
- Add note about Python support in MCP server

Phase 4 complete: Local Swarm now exposes OpenAI-compatible API at:
- POST /v1/chat/completions (with streaming support)
- GET /v1/models
- GET /health

Ready for use with opencode and other OpenAI-compatible clients.
2026-02-23 17:29:16 +01:00
sleepy 2ce3e138c1 Phase 3: Implement swarm management and consensus
Add src/swarm/worker.py:
- SwarmWorker class managing single LLM instance
- WorkerStats for tracking performance metrics
- WorkerInfo dataclass for status reporting
- Async generation with streaming support
- Health monitoring and graceful shutdown

Add src/swarm/consensus.py:
- ConsensusEngine with multiple voting strategies
- Similarity voting using sentence-transformers embeddings
- Quality voting based on code structure and completeness
- Fastest voting for low-latency scenarios
- Majority voting as fallback
- Confidence scoring for all strategies

Add src/swarm/manager.py:
- SwarmManager orchestrating multiple workers
- Parallel request distribution to all workers
- Integration with consensus engine
- Streaming support from fastest worker
- Status monitoring and health checks
- Graceful shutdown coordination

Update src/swarm/__init__.py:
- Export main classes for easy importing

Update main.py:
- Add --test mode for sample inference
- Integrate SwarmManager initialization
- Show inference results and consensus details
- Keep swarm running until interrupted
- Better error handling and status display

Phase 3 complete: Swarm can spawn N workers, generate responses,
and run consensus voting to select the best output.
2026-02-23 17:22:54 +01:00
sleepy 6d7f323bd4 Phase 2: Implement backend integration and model downloading
Add src/backends/base.py:
- Abstract base class LLMBackend with async interface
- GenerationRequest/GenerationResponse dataclasses
- BackendError exception hierarchy

Add src/backends/llamacpp.py:
- llama.cpp backend for GGUF models
- Supports GPU offloading (CUDA/ROCm/Metal)
- Streaming and non-streaming generation
- Memory usage tracking

Add src/backends/mlx.py:
- MLX backend for Apple Silicon
- Optimized for Metal performance
- Unified memory model support

Add src/backends/__init__.py:
- Backend factory with auto-detection
- Selects MLX for Apple Silicon, llama.cpp for others
- Auto-configures GPU layers

Add src/models/downloader.py:
- HuggingFace model downloader
- Progress bar display with tqdm
- Cache management in ~/.local_swarm/models
- Support for all registered models

Update main.py:
- Integrate model downloading (--download-only mode)
- Test backend loading after download
- Async support for backend operations
- Better error handling and reporting

Phase 2 complete: Models can be downloaded and backends can load them.
2026-02-23 17:15:37 +01:00
sleepy 1940b40be5 Update documentation: Add network federation, extended GPU support, and Android
PLAN.md updates:
- Add Phase 6: Local Network Federation with mDNS discovery
- Add Phase 7: Extended GPU support (AMD, Intel, Qualcomm)
- Update architecture diagram with network modules
- Add federation architecture diagram
- Update test coverage for all platforms

README.md updates:
- Add network federation features and configuration
- Add hardware support for AMD, Intel, Qualcomm GPUs
- Add Android/Termux installation instructions
- Update hardware detection section
- Update supported models table with more hardware examples
- Add federated swarm architecture diagram
- Add troubleshooting for AMD, Intel, Android
- Update acknowledgments with new dependencies

New todos added for:
- Network federation implementation
- AMD GPU support (ROCm)
- Intel GPU support (OneAPI)
- Android/Termux support
2026-02-23 17:05:59 +01:00
sleepy 0e08a2d66a Phase 1: Implement hardware detection and model selection
- Add src/hardware/detector.py with cross-platform GPU/CPU/RAM detection
- Add src/models/registry.py with model database (Qwen, DeepSeek, CodeLlama)
- Add src/models/selector.py with optimal model selection algorithm
- Update main.py to use new modules and display results

Features:
- Detects NVIDIA GPUs on Windows/Linux
- Detects Apple Silicon on macOS
- Calculates available memory based on platform (100% GPU VRAM, 50% unified RAM)
- Selects optimal model, quantization, and instance count
- Supports 2-8 instances with quality-based selection
2026-02-23 16:56:07 +01:00
sleepy 8cf1e16703 Initial commit: Local Swarm project structure and documentation 2026-02-23 16:46:31 +01:00