local_swarm

Author	SHA1	Message	Date
sleepy	2461f45ca8	fix: Remove slow HF API check from recommended config selection - select_optimal_model was checking HF API for available quantizations - This caused menu to hang/slow down when changing context - Now only checks availability when browsing or custom config - Recommended config uses default quantizations (faster)	2026-02-23 23:54:57 +01:00
sleepy	f2d0fddfa4	fix: Update selector to check available quantizations on Mac	2026-02-23 23:52:29 +01:00
sleepy	cb8e05e627	feat: Check available quantizations on Mac before showing menu - Update list_models() and build_models() to accept check_available parameter - Update interactive.py to pass check_available=True on Mac - Menu now filters out non-existent quantizations in real-time - Users can only select quantizations that actually exist on HF This prevents the issue where users select 4bit but the system tries to download 5bit because only certain quants exist.	2026-02-23 23:52:06 +01:00
sleepy	8028df7150	feat: Filter MLX quantizations to only show available ones - Add filter_available_mlx_quants() to check HuggingFace for existing repos - Update build_model_variants() to optionally check availability - Menu will now only show quantizations that actually exist - Prevents users from selecting non-existent quantizations Note: This adds a small delay when building models as it checks HF API, but prevents download failures later.	2026-02-23 23:50:12 +01:00
sleepy	e323d43d2b	feat: Validate MLX models exist before download and suggest alternatives - Add _validate_mlx_model_exists() to check HuggingFace repos - Show warning when selected quantization doesn't exist - List available quantizations for the model - Better error messages with suggestions This prevents trying to download non-existent quantizations like 5bit when only 3bit, 4bit, 6bit, 8bit are available.	2026-02-23 23:48:53 +01:00
sleepy	792c40594e	fix: Recommended config shows 3 responses on Mac instead of 1 Updated _try_model_with_context and _try_smallest_variant_with_context: - On Mac (use_mlx=True): Returns 3 responses by default - On other platforms: Still calculates based on VRAM - Memory calculation fixed for Mac (doesn't multiply by response count) Fixes issue where recommended config showed 'Responses: 1' on Mac	2026-02-23 23:46:01 +01:00
sleepy	a4049f1c35	fix: Correct memory calculations for Mac seed variation mode On Mac (Apple Silicon) with seed variation: - Total memory no longer multiplied by number of responses - Memory is shared across all responses (same model, different seeds) - list_available_configurations: Uses 3 responses, single memory calculation - custom_configuration: Memory doesn't scale with response count - show_startup_summary: Shows '(shared)' for RAM on Mac - All memory displays now accurate for seed variation mode	2026-02-23 23:42:47 +01:00
sleepy	884cb798a5	feat: UI shows 'responses' instead of 'instances' on Mac - On Apple Silicon, UI terminology changed from 'instances' to 'responses' - Mac default: 3 responses (configurable 2-5) - Non-Mac: Still uses memory-based calculation - Added explanation that seed variation keeps memory constant - Menu and prompts updated to show appropriate terminology	2026-02-23 23:38:31 +01:00
sleepy	411295acba	feat: Add seed variation and reviewer modes for Apple Silicon - Add use_seed_variation mode: Generate multiple responses from one model with different random seeds (saves memory on Apple Silicon) - Add enable_reviewer mode: A critic worker validates consensus results and triggers retries if output looks suspicious - Add generate_with_seed_variation() method for single-model multi-response - Add generate_with_reviewer() method with feedback loop - Auto-enable seed variation on Apple Silicon to save memory - Configurable max_retries for reviewer mode	2026-02-23 23:33:43 +01:00
sleepy	93f5788d74	feat: Add tool calling support to API - Add Tool, ToolCall, FunctionDefinition models - Format prompts with tool descriptions for Qwen models - Parse tool calls from model output (JSON and function call patterns) - Auto-disable streaming when tools are present - Return tool_calls in API response with proper finish_reason - Support both simple function calls and JSON tool_calls format	2026-02-23 23:08:47 +01:00
sleepy	472961cc23	feat: Apple Silicon MLX support, sequential workers, live status display, worker names Major improvements for macOS/Apple Silicon: - Add spawn-based multiprocessing for Metal GPU compatibility - Implement sequential generation mode for multiple workers - Each worker runs one-at-a-time to avoid GPU conflicts - All workers stay loaded in memory for fast switching User Experience: - 100 unique worker names (Alpha, Raven, Zeus, etc.) - Live terminal status display with progress bars - Show context usage and last output per worker - Display IP addresses for network workers Configuration: - Default port changed to 17615 (from 8000) - Context size options: 16K, 32K (default), 64K, 128K - Offloading options: none, 20%, 50% - Default max_tokens: 1024 MLX Quantization Support: - Support 3bit, 4bit, 5bit, 6bit, 8bit MLX models - Proper memory calculations for each quantization - Sequential mode automatically enabled on Apple Silicon Bug Fixes: - Fix instance calculation (was always returning 1) - Fix quantization bit detection for MLX models - Fix config.json generation in model folders - Preload MiniLM embedding model during init Files Changed: - main.py: Spawn method for macOS, port 17615 - src/backends/mlx.py: MLX generation with stop sequences - src/models/selector.py: Fix instance calculation - src/swarm/manager.py: Sequential generation mode - src/swarm/consensus.py: Preload embedding model - src/swarm/worker.py: Progress tracking per worker - src/swarm/worker_names.py: 100 unique names (NEW) - src/swarm/status_monitor.py: Live display (NEW) - src/interactive.py: Context/offload menus - src/models/registry.py: MLX quantization sizes - src/api/server.py: Port 17615, live status	2026-02-23 22:57:38 +01:00
sleepy	cbcba954ae	Add CONTEXT.md documentation Document the context window discussion and design decisions: - Industry approaches (MoE, Ensemble, Pipeline, Speculative) - Memory offloading options and trade-offs - Why KV cache can't be shared between workers - Three architectural options for 30K-60K+ context - Current implementation status - Hardware-specific recommendations Provides reference for future enhancements and helps users understand memory constraints in swarm architectures.	2026-02-23 20:19:46 +01:00
sleepy	e794fe29d4	Fix critical bugs, concurrency issues, and code quality across codebase - Fix asyncio.create_task() crash in zeroconf background thread (discovery.py) - Fix int(bytes) TypeError in peer property decoding (discovery.py) - Fix unreachable Android/Qualcomm GPU detection path (detector.py) - Add nvmlShutdown() to prevent NVML resource leak (detector.py) - Wrap blocking inference in asyncio.to_thread() to unblock event loop (llamacpp.py, mlx.py) - Initialize and use asyncio.Lock for concurrency safety (llamacpp.py) - Fix VRAM regex matching GPU index instead of byte value (amd.py) - Implement best_of_n federation strategy (was dead code) (federation.py) - Lazy-import aiohttp/mcp to avoid hard ImportError (federation.py, mcp_server.py) - Fix response_model conflict with streaming responses (routes.py) - Fix CORS allow_origins=* with allow_credentials=True violation (server.py) - Fix memory calculation using pre-clamped instance count (selector.py) - Fix calculate_max_instances returning 2 when only 0-1 fit (selector.py) - Atomic downloads via .part file to prevent caching partial files (downloader.py) - Replace recursive menu navigation with loop-based approach (interactive.py) - Implement actual majority voting in _majority_vote (consensus.py) - Fix false-positive list detection in quality scoring (consensus.py) - Replace 15+ bare except: with except Exception: across codebase - Fix .json() -> .model_dump_json() for Pydantic v2 (routes.py) - Remove unused MCP imports, add empty prompt validation (mcp_server.py) - Use tokenizer for accurate MLX token counting (mlx.py) - Fix memory estimate from FP32 (4) to quantized (0.6) (llamacpp.py) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 20:11:58 +01:00
sleepy	d68eda45d8	Fix .gitignore to allow src/models/ directory The .gitignore had 'models/' which excluded both: - The models/ cache directory at root (intended) - The src/models/ module directory (NOT intended) Changed to '/models/' to only exclude root-level models/ directory while allowing src/models/ to be tracked. This fixes the 'No module named models' error on fresh clones.	2026-02-23 19:51:40 +01:00
sleepy	6e10438914	Fix Windows import path issue Add more robust path resolution for Windows: - Use Path.resolve() to get absolute path - Also add parent directory to sys.path - Fixes 'No module named models' error on Windows Users can now run: python main.py --test Or use the module approach: python -m local_swarm --test	2026-02-23 19:24:33 +01:00
sleepy	77f26f7381	Update PLAN.md and README with documentation completion Update Phase 8.3 Documentation to mark as COMPLETED: - Document all sections added to docs/GUIDE.md - Update README.md with documentation links Documentation now includes: - Quick Start Guide for all platforms - Opencode configuration examples - API reference with examples - Comprehensive troubleshooting - Performance tuning guide - Advanced configuration options	2026-02-23 18:40:35 +01:00
sleepy	1788087145	Add comprehensive documentation Create docs/GUIDE.md with complete documentation: - Quick Start Guide for all platforms - Opencode configuration examples: - Basic configuration - Remote machine setup - Multiple model options - Environment-specific configs - Testing instructions - API Reference: - All OpenAI-compatible endpoints - Federation endpoints - Request/response examples - Troubleshooting Guide: - Common issues and solutions - Platform-specific problems - Installation issues - Advanced Configuration: - config.yaml options - Environment variables - Performance Tuning: - Speed vs quality settings - Memory usage tables - Recommended configurations - MCP Server setup and usage - Network Federation guide Update README.md: - Add Documentation section with links - Reference the complete guide Documentation now covers: ✅ Installation all platforms ✅ Opencode integration ✅ API usage ✅ Troubleshooting ✅ Performance optimization ✅ Advanced features	2026-02-23 18:39:56 +01:00
sleepy	08a5b800d0	Phase 7: Add AMD, Intel, and Qualcomm GPU support Add src/hardware/amd.py: - AMD GPU detection via ROCm (rocm-smi) - Windows AMD detection via PowerShell/WMI - Fallback to PCI detection on Linux - VRAM parsing from ROCm output - Driver version detection - Supports Radeon RX series and other AMD GPUs Add src/hardware/intel.py: - Intel GPU detection via OneAPI (sycl-ls) - OpenCL fallback detection - Windows Intel detection via PowerShell - Arc, Iris Xe, UHD graphics support - VRAM estimation for discrete vs integrated - Driver version detection Add src/hardware/qualcomm.py: - Qualcomm Snapdragon detection for Android/Termux - Multi-method detection (cpuinfo, hardware, getprop) - Termux environment detection - Adreno GPU model extraction - RAM-based VRAM estimation (25% of total) - Setup requirements checking - Device model name retrieval Update src/hardware/detector.py: - Add is_mobile flag to GPUInfo dataclass - Update detect_gpu() to check all GPU vendors - Priority: NVIDIA > AMD > Intel > Qualcomm - Add detect_qualcomm() helper function All detection modules support: - Multiple detection methods with fallbacks - Platform-specific implementations (Linux/Windows/Android) - Graceful handling of missing tools/drivers - Consistent GPUInfo return format Phase 7 complete: Extended GPU support for AMD, Intel, and Qualcomm/Adreno GPUs.	2026-02-23 18:35:13 +01:00
sleepy	765f26cd49	Add comprehensive Tips & Help menu Add new menu option [t] Tips & Help: - Model Recommendations: Ranked list of best coding models - Qwen 2.5 Coder (best overall) - DeepSeek Coder (great alternative) - CodeLlama (solid choice) - Size recommendations (1-3B, 7B, 13B+) - Quantization Guide: Simple explanation of Q4/Q5/Q6 - What quantization is - Trade-offs between levels - File size comparison - When to use each level - Quick reference table - Instance Count Tips: Research-based recommendations - Minimum 2 instances (required for consensus) - Sweet spot: 3-5 instances (85-90% of benefit) - Maximum 8 instances (diminishing returns) - Memory calculation examples - Research note on consensus effectiveness - Hardware Optimization: Tips specific to user's setup - Apple Silicon (MLX backend tips) - Discrete GPU (CUDA/ROCm optimization) - CPU-only (practical limitations) - General speed vs quality trade-offs - Memory management best practices All tips are shown in interactive format with clear sections, practical advice, and hardware-specific recommendations based on detected system specs.	2026-02-23 18:10:21 +01:00
sleepy	74bbca18bd	Phase 6: Network Federation (#2 ) * Add exit menu option Add [q] Quit option to interactive menu: - Allows user to exit without starting the swarm - Shows 'Exiting...' message - Returns None to gracefully exit main.py * Phase 6: Implement network federation (WIP) Add src/network/discovery.py: - SwarmDiscovery class using mDNS/Bonjour - PeerInfo dataclass for peer metadata - Automatic peer discovery on local network - Service advertising for this swarm - Stale peer detection and cleanup Add src/network/federation.py: - FederationClient for HTTP communication with peers - FederatedSwarm for managing cross-swarm consensus - Two-phase voting: local consensus then peer voting - Weighted voting strategy based on confidence - Federation status monitoring - Peer health checking Add src/network/__init__.py: - Export network classes Update src/api/routes.py: - POST /v1/federation/vote - Receive votes from peers - GET /v1/federation/status - Get federation status - GET /v1/federation/peers - List discovered peers Update requirements.txt: - Add zeroconf for mDNS discovery Features: - Auto-discovery of other Local Swarm instances - Cross-swarm consensus voting - Configurable minimum peer requirements - Fallback to local-only if no peers available - Peer health monitoring TODO: - Integrate federation into main.py - Add --federation flag - Test multi-machine setup	2026-02-23 18:06:43 +01:00
sleepy	2f547fe101	Phase 6: Network Federation (#1 ) * Update PLAN.md with new phases - Add Phase 5: CLI & Interactive Interface - Interactive menu system with 3 options - Hardware display with detailed specs - Resource usage monitoring - Custom configuration wizard - Add Phase 5.5: MCP Server - MCP protocol implementation - 5 MCP tools for AI assistants - Dual server mode (HTTP + MCP) - Reorganize phase structure for clarity * Phase 6: Implement network federation (WIP) Add src/network/discovery.py: - SwarmDiscovery class using mDNS/Bonjour - PeerInfo dataclass for peer metadata - Automatic peer discovery on local network - Service advertising for this swarm - Stale peer detection and cleanup Add src/network/federation.py: - FederationClient for HTTP communication with peers - FederatedSwarm for managing cross-swarm consensus - Two-phase voting: local consensus then peer voting - Weighted voting strategy based on confidence - Federation status monitoring - Peer health checking Add src/network/__init__.py: - Export network classes Update src/api/routes.py: - POST /v1/federation/vote - Receive votes from peers - GET /v1/federation/status - Get federation status - GET /v1/federation/peers - List discovered peers Update requirements.txt: - Add zeroconf for mDNS discovery Features: - Auto-discovery of other Local Swarm instances - Cross-swarm consensus voting - Configurable minimum peer requirements - Fallback to local-only if no peers available - Peer health monitoring TODO: - Integrate federation into main.py - Add --federation flag - Test multi-machine setup	2026-02-23 18:05:27 +01:00
sleepy	3ff988b9ba	Fix bugs and add model update feature Fix duplicate instances bug: - Remove 'instances' from label in list_available_configurations() - Now shows correctly as 'Model Size (quant)' with 'X instances' in description Add more models to registry: - Llama 3.2 (3B, 1B) - Phi-4 (4B) - Gemma 2 (2B, 4B, 9B) - StarCoder2 (3B, 7B, 15B) - Updated HF repo mappings and filename patterns Add model update mechanism (src/models/updater.py): - ModelUpdater class for querying HuggingFace Hub - Queries trending GGUF models tagged with 'code' - Filters out already-known models - Estimates VRAM from model name - 30-minute rate limiting between checks - Saves custom models to ~/.local_swarm/custom_models.json - Manual check only (no auto-update to avoid overloading HF) Add menu option '4 - Check for New Models': - Queries HF for trending models (respects rate limits) - Displays model info (name, downloads, likes, est. VRAM) - Allows adding models to custom registry - Returns to model selection after About etcd: - Not needed for home networks - mDNS (Bonjour) is simpler and requires no central server - Perfect for 2-5 machine setups - Zero configuration, auto-discovery Changes to interactive.py: - Added option 4 to main menu - Added check_for_new_models_menu() function - Displays trending models with metadata - Allows manual addition to custom registry	2026-02-23 18:02:50 +01:00
sleepy	ac8a90f2bf	Update PLAN.md with new phases - Add Phase 5: CLI & Interactive Interface - Interactive menu system with 3 options - Hardware display with detailed specs - Resource usage monitoring - Custom configuration wizard - Add Phase 5.5: MCP Server - MCP protocol implementation - 5 MCP tools for AI assistants - Dual server mode (HTTP + MCP) - Reorganize phase structure for clarity	2026-02-23 17:48:49 +01:00
sleepy	b9669e415d	Update README with interactive menu documentation - Add interactive mode section with screenshots - Document the 3 menu options (recommended, browse, custom) - Add startup summary section showing what info is displayed - Add interactive features and MCP server to features list - Document --auto flag to skip menu - Add hardware/resource usage display examples	2026-02-23 17:44:44 +01:00
sleepy	1e183bd4cc	Add interactive menu system and startup summary Add src/interactive.py: - Interactive model selection menu with 3 options: 1. Recommended Configuration (auto-detect best) 2. Browse All Configurations (see all feasible models) 3. Custom Configuration (user-specified model + instances) - Hardware info display with detailed specs - Resource usage monitoring showing: - Swarm status, model, workers - Memory usage per worker - Worker statistics (requests, latency, tokens/sec) - Custom configuration wizard: - Select from available models - Choose model size (3B, 7B, 14B, etc.) - Pick quantization level (Q4, Q5, Q6) - Specify number of instances - Runtime menu for monitoring (refresh/quit) Update main.py: - Default mode now shows interactive menu - Add --auto flag to skip menu and use recommended config - Show comprehensive startup summary with hardware + config + usage - Better integration with interactive module - Removed redundant print functions (now in interactive.py) Features: - Clear screen for clean menu display - Formatted headers and sections - Menu validation and error handling - Memory utilization percentage display - Real-time worker status with health indicators	2026-02-23 17:43:38 +01:00
sleepy	d3d2c50c71	Update README with MCP server documentation - Add MCP Server section explaining the --mcp flag - Document the 5 MCP tools available to AI assistants - Add --mcp to CLI Options section - Explain benefits of MCP integration for automatic hardware queries	2026-02-23 17:38:48 +01:00
sleepy	cc0ee08b6f	Phase 5: Add MCP server support alongside HTTP API Add src/mcp_server.py: - LocalSwarmMCPServer class implementing MCP protocol - 5 MCP tools exposed: - get_hardware_info: Check CPU, GPU, RAM - get_swarm_status: Get worker status and model info - generate_code: Generate with consensus voting - list_available_models: Show all runnable models - get_worker_details: Detailed worker statistics - Integration with SwarmManager for code generation - Stdio transport for AI assistant communication Update requirements.txt: - Add mcp>=1.0.0 dependency Update main.py: - Add --mcp flag to enable MCP server - Run MCP server alongside HTTP API when enabled - Both servers share the same SwarmManager instance - Display MCP status in startup message Now Local Swarm supports both: - HTTP API (for external clients, curl, opencode) - MCP server (for tight AI assistant integration) Usage: python main.py # HTTP API only python main.py --mcp # HTTP API + MCP server MCP tools allow AI assistants to: - Query hardware capabilities before suggesting models - Check swarm health and worker status - Generate code with automatic consensus voting - List available models for the hardware	2026-02-23 17:37:55 +01:00
sleepy	4367c79d83	Phase 4: Implement OpenAI-compatible API server Add src/api/models.py: - Pydantic models for OpenAI API compatibility - ChatCompletionRequest/Response models - Streaming response models (SSE format) - Model listing and health check models Add src/api/routes.py: - POST /v1/chat/completions endpoint - GET /v1/models endpoint - GET /health and /v1/health endpoints - Support for streaming (text/event-stream) and regular responses - Message formatting for chat prompts - Error handling with proper HTTP status codes Add src/api/server.py: - FastAPI application with CORS middleware - Lifespan context for startup/shutdown - Integration with SwarmManager - Uvicorn server configuration Update src/api/__init__.py: - Export API classes and functions Update main.py: - Integrate API server into default workflow - Start API server on http://127.0.0.1:PORT - Show API endpoints and opencode configuration - Graceful shutdown on Ctrl+C Update AGENTS.md: - Add note about Python support in MCP server Phase 4 complete: Local Swarm now exposes OpenAI-compatible API at: - POST /v1/chat/completions (with streaming support) - GET /v1/models - GET /health Ready for use with opencode and other OpenAI-compatible clients.	2026-02-23 17:29:16 +01:00
sleepy	2ce3e138c1	Phase 3: Implement swarm management and consensus Add src/swarm/worker.py: - SwarmWorker class managing single LLM instance - WorkerStats for tracking performance metrics - WorkerInfo dataclass for status reporting - Async generation with streaming support - Health monitoring and graceful shutdown Add src/swarm/consensus.py: - ConsensusEngine with multiple voting strategies - Similarity voting using sentence-transformers embeddings - Quality voting based on code structure and completeness - Fastest voting for low-latency scenarios - Majority voting as fallback - Confidence scoring for all strategies Add src/swarm/manager.py: - SwarmManager orchestrating multiple workers - Parallel request distribution to all workers - Integration with consensus engine - Streaming support from fastest worker - Status monitoring and health checks - Graceful shutdown coordination Update src/swarm/__init__.py: - Export main classes for easy importing Update main.py: - Add --test mode for sample inference - Integrate SwarmManager initialization - Show inference results and consensus details - Keep swarm running until interrupted - Better error handling and status display Phase 3 complete: Swarm can spawn N workers, generate responses, and run consensus voting to select the best output.	2026-02-23 17:22:54 +01:00
sleepy	6d7f323bd4	Phase 2: Implement backend integration and model downloading Add src/backends/base.py: - Abstract base class LLMBackend with async interface - GenerationRequest/GenerationResponse dataclasses - BackendError exception hierarchy Add src/backends/llamacpp.py: - llama.cpp backend for GGUF models - Supports GPU offloading (CUDA/ROCm/Metal) - Streaming and non-streaming generation - Memory usage tracking Add src/backends/mlx.py: - MLX backend for Apple Silicon - Optimized for Metal performance - Unified memory model support Add src/backends/__init__.py: - Backend factory with auto-detection - Selects MLX for Apple Silicon, llama.cpp for others - Auto-configures GPU layers Add src/models/downloader.py: - HuggingFace model downloader - Progress bar display with tqdm - Cache management in ~/.local_swarm/models - Support for all registered models Update main.py: - Integrate model downloading (--download-only mode) - Test backend loading after download - Async support for backend operations - Better error handling and reporting Phase 2 complete: Models can be downloaded and backends can load them.	2026-02-23 17:15:37 +01:00
sleepy	1940b40be5	Update documentation: Add network federation, extended GPU support, and Android PLAN.md updates: - Add Phase 6: Local Network Federation with mDNS discovery - Add Phase 7: Extended GPU support (AMD, Intel, Qualcomm) - Update architecture diagram with network modules - Add federation architecture diagram - Update test coverage for all platforms README.md updates: - Add network federation features and configuration - Add hardware support for AMD, Intel, Qualcomm GPUs - Add Android/Termux installation instructions - Update hardware detection section - Update supported models table with more hardware examples - Add federated swarm architecture diagram - Add troubleshooting for AMD, Intel, Android - Update acknowledgments with new dependencies New todos added for: - Network federation implementation - AMD GPU support (ROCm) - Intel GPU support (OneAPI) - Android/Termux support	2026-02-23 17:05:59 +01:00
sleepy	0e08a2d66a	Phase 1: Implement hardware detection and model selection - Add src/hardware/detector.py with cross-platform GPU/CPU/RAM detection - Add src/models/registry.py with model database (Qwen, DeepSeek, CodeLlama) - Add src/models/selector.py with optimal model selection algorithm - Update main.py to use new modules and display results Features: - Detects NVIDIA GPUs on Windows/Linux - Detects Apple Silicon on macOS - Calculates available memory based on platform (100% GPU VRAM, 50% unified RAM) - Selects optimal model, quantization, and instance count - Supports 2-8 instances with quality-based selection	2026-02-23 16:56:07 +01:00
sleepy	8cf1e16703	Initial commit: Local Swarm project structure and documentation	2026-02-23 16:46:31 +01:00

1 2 3

133 Commits