local_swarm

Author	SHA1	Message	Date
sleepy	d30eedaa63	Fix opencode integration: streaming, response format, and tool handling - Fix streaming to work even when tools are present (was forcing JSON mode) - Fix response format: use empty list [] instead of null for tool_calls - Add exclude_none config to ChatMessage model to match OpenAI format - Remove tool instructions from prompt (were confusing 3B model) - Fix tool call parsing to handle markdown code blocks properly - Change default instances from 3 to 1 for faster debugging - Allow 1 instance minimum in interactive config (was 2 on Mac) - Add debug logging to track requests and responses Fixes infinite loop issue where opencode would retry requests repeatedly	2026-02-24 03:44:46 +01:00
sleepy	2461f45ca8	fix: Remove slow HF API check from recommended config selection - select_optimal_model was checking HF API for available quantizations - This caused menu to hang/slow down when changing context - Now only checks availability when browsing or custom config - Recommended config uses default quantizations (faster)	2026-02-23 23:54:57 +01:00
sleepy	f2d0fddfa4	fix: Update selector to check available quantizations on Mac	2026-02-23 23:52:29 +01:00
sleepy	cb8e05e627	feat: Check available quantizations on Mac before showing menu - Update list_models() and build_models() to accept check_available parameter - Update interactive.py to pass check_available=True on Mac - Menu now filters out non-existent quantizations in real-time - Users can only select quantizations that actually exist on HF This prevents the issue where users select 4bit but the system tries to download 5bit because only certain quants exist.	2026-02-23 23:52:06 +01:00
sleepy	8028df7150	feat: Filter MLX quantizations to only show available ones - Add filter_available_mlx_quants() to check HuggingFace for existing repos - Update build_model_variants() to optionally check availability - Menu will now only show quantizations that actually exist - Prevents users from selecting non-existent quantizations Note: This adds a small delay when building models as it checks HF API, but prevents download failures later.	2026-02-23 23:50:12 +01:00
sleepy	e323d43d2b	feat: Validate MLX models exist before download and suggest alternatives - Add _validate_mlx_model_exists() to check HuggingFace repos - Show warning when selected quantization doesn't exist - List available quantizations for the model - Better error messages with suggestions This prevents trying to download non-existent quantizations like 5bit when only 3bit, 4bit, 6bit, 8bit are available.	2026-02-23 23:48:53 +01:00
sleepy	a4049f1c35	fix: Correct memory calculations for Mac seed variation mode On Mac (Apple Silicon) with seed variation: - Total memory no longer multiplied by number of responses - Memory is shared across all responses (same model, different seeds) - list_available_configurations: Uses 3 responses, single memory calculation - custom_configuration: Memory doesn't scale with response count - show_startup_summary: Shows '(shared)' for RAM on Mac - All memory displays now accurate for seed variation mode	2026-02-23 23:42:47 +01:00
sleepy	411295acba	feat: Add seed variation and reviewer modes for Apple Silicon - Add use_seed_variation mode: Generate multiple responses from one model with different random seeds (saves memory on Apple Silicon) - Add enable_reviewer mode: A critic worker validates consensus results and triggers retries if output looks suspicious - Add generate_with_seed_variation() method for single-model multi-response - Add generate_with_reviewer() method with feedback loop - Auto-enable seed variation on Apple Silicon to save memory - Configurable max_retries for reviewer mode	2026-02-23 23:33:43 +01:00
sleepy	93f5788d74	feat: Add tool calling support to API - Add Tool, ToolCall, FunctionDefinition models - Format prompts with tool descriptions for Qwen models - Parse tool calls from model output (JSON and function call patterns) - Auto-disable streaming when tools are present - Return tool_calls in API response with proper finish_reason - Support both simple function calls and JSON tool_calls format	2026-02-23 23:08:47 +01:00
sleepy	472961cc23	feat: Apple Silicon MLX support, sequential workers, live status display, worker names Major improvements for macOS/Apple Silicon: - Add spawn-based multiprocessing for Metal GPU compatibility - Implement sequential generation mode for multiple workers - Each worker runs one-at-a-time to avoid GPU conflicts - All workers stay loaded in memory for fast switching User Experience: - 100 unique worker names (Alpha, Raven, Zeus, etc.) - Live terminal status display with progress bars - Show context usage and last output per worker - Display IP addresses for network workers Configuration: - Default port changed to 17615 (from 8000) - Context size options: 16K, 32K (default), 64K, 128K - Offloading options: none, 20%, 50% - Default max_tokens: 1024 MLX Quantization Support: - Support 3bit, 4bit, 5bit, 6bit, 8bit MLX models - Proper memory calculations for each quantization - Sequential mode automatically enabled on Apple Silicon Bug Fixes: - Fix instance calculation (was always returning 1) - Fix quantization bit detection for MLX models - Fix config.json generation in model folders - Preload MiniLM embedding model during init Files Changed: - main.py: Spawn method for macOS, port 17615 - src/backends/mlx.py: MLX generation with stop sequences - src/models/selector.py: Fix instance calculation - src/swarm/manager.py: Sequential generation mode - src/swarm/consensus.py: Preload embedding model - src/swarm/worker.py: Progress tracking per worker - src/swarm/worker_names.py: 100 unique names (NEW) - src/swarm/status_monitor.py: Live display (NEW) - src/interactive.py: Context/offload menus - src/models/registry.py: MLX quantization sizes - src/api/server.py: Port 17615, live status	2026-02-23 22:57:38 +01:00
sleepy	2f547fe101	Phase 6: Network Federation (#1 ) * Update PLAN.md with new phases - Add Phase 5: CLI & Interactive Interface - Interactive menu system with 3 options - Hardware display with detailed specs - Resource usage monitoring - Custom configuration wizard - Add Phase 5.5: MCP Server - MCP protocol implementation - 5 MCP tools for AI assistants - Dual server mode (HTTP + MCP) - Reorganize phase structure for clarity * Phase 6: Implement network federation (WIP) Add src/network/discovery.py: - SwarmDiscovery class using mDNS/Bonjour - PeerInfo dataclass for peer metadata - Automatic peer discovery on local network - Service advertising for this swarm - Stale peer detection and cleanup Add src/network/federation.py: - FederationClient for HTTP communication with peers - FederatedSwarm for managing cross-swarm consensus - Two-phase voting: local consensus then peer voting - Weighted voting strategy based on confidence - Federation status monitoring - Peer health checking Add src/network/__init__.py: - Export network classes Update src/api/routes.py: - POST /v1/federation/vote - Receive votes from peers - GET /v1/federation/status - Get federation status - GET /v1/federation/peers - List discovered peers Update requirements.txt: - Add zeroconf for mDNS discovery Features: - Auto-discovery of other Local Swarm instances - Cross-swarm consensus voting - Configurable minimum peer requirements - Fallback to local-only if no peers available - Peer health monitoring TODO: - Integrate federation into main.py - Add --federation flag - Test multi-machine setup	2026-02-23 18:05:27 +01:00
sleepy	b9669e415d	Update README with interactive menu documentation - Add interactive mode section with screenshots - Document the 3 menu options (recommended, browse, custom) - Add startup summary section showing what info is displayed - Add interactive features and MCP server to features list - Document --auto flag to skip menu - Add hardware/resource usage display examples	2026-02-23 17:44:44 +01:00
sleepy	1e183bd4cc	Add interactive menu system and startup summary Add src/interactive.py: - Interactive model selection menu with 3 options: 1. Recommended Configuration (auto-detect best) 2. Browse All Configurations (see all feasible models) 3. Custom Configuration (user-specified model + instances) - Hardware info display with detailed specs - Resource usage monitoring showing: - Swarm status, model, workers - Memory usage per worker - Worker statistics (requests, latency, tokens/sec) - Custom configuration wizard: - Select from available models - Choose model size (3B, 7B, 14B, etc.) - Pick quantization level (Q4, Q5, Q6) - Specify number of instances - Runtime menu for monitoring (refresh/quit) Update main.py: - Default mode now shows interactive menu - Add --auto flag to skip menu and use recommended config - Show comprehensive startup summary with hardware + config + usage - Better integration with interactive module - Removed redundant print functions (now in interactive.py) Features: - Clear screen for clean menu display - Formatted headers and sections - Menu validation and error handling - Memory utilization percentage display - Real-time worker status with health indicators	2026-02-23 17:43:38 +01:00
sleepy	d3d2c50c71	Update README with MCP server documentation - Add MCP Server section explaining the --mcp flag - Document the 5 MCP tools available to AI assistants - Add --mcp to CLI Options section - Explain benefits of MCP integration for automatic hardware queries	2026-02-23 17:38:48 +01:00
sleepy	cc0ee08b6f	Phase 5: Add MCP server support alongside HTTP API Add src/mcp_server.py: - LocalSwarmMCPServer class implementing MCP protocol - 5 MCP tools exposed: - get_hardware_info: Check CPU, GPU, RAM - get_swarm_status: Get worker status and model info - generate_code: Generate with consensus voting - list_available_models: Show all runnable models - get_worker_details: Detailed worker statistics - Integration with SwarmManager for code generation - Stdio transport for AI assistant communication Update requirements.txt: - Add mcp>=1.0.0 dependency Update main.py: - Add --mcp flag to enable MCP server - Run MCP server alongside HTTP API when enabled - Both servers share the same SwarmManager instance - Display MCP status in startup message Now Local Swarm supports both: - HTTP API (for external clients, curl, opencode) - MCP server (for tight AI assistant integration) Usage: python main.py # HTTP API only python main.py --mcp # HTTP API + MCP server MCP tools allow AI assistants to: - Query hardware capabilities before suggesting models - Check swarm health and worker status - Generate code with automatic consensus voting - List available models for the hardware	2026-02-23 17:37:55 +01:00
sleepy	4367c79d83	Phase 4: Implement OpenAI-compatible API server Add src/api/models.py: - Pydantic models for OpenAI API compatibility - ChatCompletionRequest/Response models - Streaming response models (SSE format) - Model listing and health check models Add src/api/routes.py: - POST /v1/chat/completions endpoint - GET /v1/models endpoint - GET /health and /v1/health endpoints - Support for streaming (text/event-stream) and regular responses - Message formatting for chat prompts - Error handling with proper HTTP status codes Add src/api/server.py: - FastAPI application with CORS middleware - Lifespan context for startup/shutdown - Integration with SwarmManager - Uvicorn server configuration Update src/api/__init__.py: - Export API classes and functions Update main.py: - Integrate API server into default workflow - Start API server on http://127.0.0.1:PORT - Show API endpoints and opencode configuration - Graceful shutdown on Ctrl+C Update AGENTS.md: - Add note about Python support in MCP server Phase 4 complete: Local Swarm now exposes OpenAI-compatible API at: - POST /v1/chat/completions (with streaming support) - GET /v1/models - GET /health Ready for use with opencode and other OpenAI-compatible clients.	2026-02-23 17:29:16 +01:00
sleepy	2ce3e138c1	Phase 3: Implement swarm management and consensus Add src/swarm/worker.py: - SwarmWorker class managing single LLM instance - WorkerStats for tracking performance metrics - WorkerInfo dataclass for status reporting - Async generation with streaming support - Health monitoring and graceful shutdown Add src/swarm/consensus.py: - ConsensusEngine with multiple voting strategies - Similarity voting using sentence-transformers embeddings - Quality voting based on code structure and completeness - Fastest voting for low-latency scenarios - Majority voting as fallback - Confidence scoring for all strategies Add src/swarm/manager.py: - SwarmManager orchestrating multiple workers - Parallel request distribution to all workers - Integration with consensus engine - Streaming support from fastest worker - Status monitoring and health checks - Graceful shutdown coordination Update src/swarm/__init__.py: - Export main classes for easy importing Update main.py: - Add --test mode for sample inference - Integrate SwarmManager initialization - Show inference results and consensus details - Keep swarm running until interrupted - Better error handling and status display Phase 3 complete: Swarm can spawn N workers, generate responses, and run consensus voting to select the best output.	2026-02-23 17:22:54 +01:00
sleepy	6d7f323bd4	Phase 2: Implement backend integration and model downloading Add src/backends/base.py: - Abstract base class LLMBackend with async interface - GenerationRequest/GenerationResponse dataclasses - BackendError exception hierarchy Add src/backends/llamacpp.py: - llama.cpp backend for GGUF models - Supports GPU offloading (CUDA/ROCm/Metal) - Streaming and non-streaming generation - Memory usage tracking Add src/backends/mlx.py: - MLX backend for Apple Silicon - Optimized for Metal performance - Unified memory model support Add src/backends/__init__.py: - Backend factory with auto-detection - Selects MLX for Apple Silicon, llama.cpp for others - Auto-configures GPU layers Add src/models/downloader.py: - HuggingFace model downloader - Progress bar display with tqdm - Cache management in ~/.local_swarm/models - Support for all registered models Update main.py: - Integrate model downloading (--download-only mode) - Test backend loading after download - Async support for backend operations - Better error handling and reporting Phase 2 complete: Models can be downloaded and backends can load them.	2026-02-23 17:15:37 +01:00
sleepy	1940b40be5	Update documentation: Add network federation, extended GPU support, and Android PLAN.md updates: - Add Phase 6: Local Network Federation with mDNS discovery - Add Phase 7: Extended GPU support (AMD, Intel, Qualcomm) - Update architecture diagram with network modules - Add federation architecture diagram - Update test coverage for all platforms README.md updates: - Add network federation features and configuration - Add hardware support for AMD, Intel, Qualcomm GPUs - Add Android/Termux installation instructions - Update hardware detection section - Update supported models table with more hardware examples - Add federated swarm architecture diagram - Add troubleshooting for AMD, Intel, Android - Update acknowledgments with new dependencies New todos added for: - Network federation implementation - AMD GPU support (ROCm) - Intel GPU support (OneAPI) - Android/Termux support	2026-02-23 17:05:59 +01:00
sleepy	0e08a2d66a	Phase 1: Implement hardware detection and model selection - Add src/hardware/detector.py with cross-platform GPU/CPU/RAM detection - Add src/models/registry.py with model database (Qwen, DeepSeek, CodeLlama) - Add src/models/selector.py with optimal model selection algorithm - Update main.py to use new modules and display results Features: - Detects NVIDIA GPUs on Windows/Linux - Detects Apple Silicon on macOS - Calculates available memory based on platform (100% GPU VRAM, 50% unified RAM) - Selects optimal model, quantization, and instance count - Supports 2-8 instances with quality-based selection	2026-02-23 16:56:07 +01:00

20 Commits