local_swarm

Author	SHA1	Message	Date
sleepy	d30eedaa63	Fix opencode integration: streaming, response format, and tool handling - Fix streaming to work even when tools are present (was forcing JSON mode) - Fix response format: use empty list [] instead of null for tool_calls - Add exclude_none config to ChatMessage model to match OpenAI format - Remove tool instructions from prompt (were confusing 3B model) - Fix tool call parsing to handle markdown code blocks properly - Change default instances from 3 to 1 for faster debugging - Allow 1 instance minimum in interactive config (was 2 on Mac) - Add debug logging to track requests and responses Fixes infinite loop issue where opencode would retry requests repeatedly	2026-02-24 03:44:46 +01:00
sleepy	93f5788d74	feat: Add tool calling support to API - Add Tool, ToolCall, FunctionDefinition models - Format prompts with tool descriptions for Qwen models - Parse tool calls from model output (JSON and function call patterns) - Auto-disable streaming when tools are present - Return tool_calls in API response with proper finish_reason - Support both simple function calls and JSON tool_calls format	2026-02-23 23:08:47 +01:00
sleepy	472961cc23	feat: Apple Silicon MLX support, sequential workers, live status display, worker names Major improvements for macOS/Apple Silicon: - Add spawn-based multiprocessing for Metal GPU compatibility - Implement sequential generation mode for multiple workers - Each worker runs one-at-a-time to avoid GPU conflicts - All workers stay loaded in memory for fast switching User Experience: - 100 unique worker names (Alpha, Raven, Zeus, etc.) - Live terminal status display with progress bars - Show context usage and last output per worker - Display IP addresses for network workers Configuration: - Default port changed to 17615 (from 8000) - Context size options: 16K, 32K (default), 64K, 128K - Offloading options: none, 20%, 50% - Default max_tokens: 1024 MLX Quantization Support: - Support 3bit, 4bit, 5bit, 6bit, 8bit MLX models - Proper memory calculations for each quantization - Sequential mode automatically enabled on Apple Silicon Bug Fixes: - Fix instance calculation (was always returning 1) - Fix quantization bit detection for MLX models - Fix config.json generation in model folders - Preload MiniLM embedding model during init Files Changed: - main.py: Spawn method for macOS, port 17615 - src/backends/mlx.py: MLX generation with stop sequences - src/models/selector.py: Fix instance calculation - src/swarm/manager.py: Sequential generation mode - src/swarm/consensus.py: Preload embedding model - src/swarm/worker.py: Progress tracking per worker - src/swarm/worker_names.py: 100 unique names (NEW) - src/swarm/status_monitor.py: Live display (NEW) - src/interactive.py: Context/offload menus - src/models/registry.py: MLX quantization sizes - src/api/server.py: Port 17615, live status	2026-02-23 22:57:38 +01:00
sleepy	e794fe29d4	Fix critical bugs, concurrency issues, and code quality across codebase - Fix asyncio.create_task() crash in zeroconf background thread (discovery.py) - Fix int(bytes) TypeError in peer property decoding (discovery.py) - Fix unreachable Android/Qualcomm GPU detection path (detector.py) - Add nvmlShutdown() to prevent NVML resource leak (detector.py) - Wrap blocking inference in asyncio.to_thread() to unblock event loop (llamacpp.py, mlx.py) - Initialize and use asyncio.Lock for concurrency safety (llamacpp.py) - Fix VRAM regex matching GPU index instead of byte value (amd.py) - Implement best_of_n federation strategy (was dead code) (federation.py) - Lazy-import aiohttp/mcp to avoid hard ImportError (federation.py, mcp_server.py) - Fix response_model conflict with streaming responses (routes.py) - Fix CORS allow_origins=* with allow_credentials=True violation (server.py) - Fix memory calculation using pre-clamped instance count (selector.py) - Fix calculate_max_instances returning 2 when only 0-1 fit (selector.py) - Atomic downloads via .part file to prevent caching partial files (downloader.py) - Replace recursive menu navigation with loop-based approach (interactive.py) - Implement actual majority voting in _majority_vote (consensus.py) - Fix false-positive list detection in quality scoring (consensus.py) - Replace 15+ bare except: with except Exception: across codebase - Fix .json() -> .model_dump_json() for Pydantic v2 (routes.py) - Remove unused MCP imports, add empty prompt validation (mcp_server.py) - Use tokenizer for accurate MLX token counting (mlx.py) - Fix memory estimate from FP32 (4) to quantized (0.6) (llamacpp.py) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 20:11:58 +01:00
sleepy	2f547fe101	Phase 6: Network Federation (#1 ) * Update PLAN.md with new phases - Add Phase 5: CLI & Interactive Interface - Interactive menu system with 3 options - Hardware display with detailed specs - Resource usage monitoring - Custom configuration wizard - Add Phase 5.5: MCP Server - MCP protocol implementation - 5 MCP tools for AI assistants - Dual server mode (HTTP + MCP) - Reorganize phase structure for clarity * Phase 6: Implement network federation (WIP) Add src/network/discovery.py: - SwarmDiscovery class using mDNS/Bonjour - PeerInfo dataclass for peer metadata - Automatic peer discovery on local network - Service advertising for this swarm - Stale peer detection and cleanup Add src/network/federation.py: - FederationClient for HTTP communication with peers - FederatedSwarm for managing cross-swarm consensus - Two-phase voting: local consensus then peer voting - Weighted voting strategy based on confidence - Federation status monitoring - Peer health checking Add src/network/__init__.py: - Export network classes Update src/api/routes.py: - POST /v1/federation/vote - Receive votes from peers - GET /v1/federation/status - Get federation status - GET /v1/federation/peers - List discovered peers Update requirements.txt: - Add zeroconf for mDNS discovery Features: - Auto-discovery of other Local Swarm instances - Cross-swarm consensus voting - Configurable minimum peer requirements - Fallback to local-only if no peers available - Peer health monitoring TODO: - Integrate federation into main.py - Add --federation flag - Test multi-machine setup	2026-02-23 18:05:27 +01:00
sleepy	4367c79d83	Phase 4: Implement OpenAI-compatible API server Add src/api/models.py: - Pydantic models for OpenAI API compatibility - ChatCompletionRequest/Response models - Streaming response models (SSE format) - Model listing and health check models Add src/api/routes.py: - POST /v1/chat/completions endpoint - GET /v1/models endpoint - GET /health and /v1/health endpoints - Support for streaming (text/event-stream) and regular responses - Message formatting for chat prompts - Error handling with proper HTTP status codes Add src/api/server.py: - FastAPI application with CORS middleware - Lifespan context for startup/shutdown - Integration with SwarmManager - Uvicorn server configuration Update src/api/__init__.py: - Export API classes and functions Update main.py: - Integrate API server into default workflow - Start API server on http://127.0.0.1:PORT - Show API endpoints and opencode configuration - Graceful shutdown on Ctrl+C Update AGENTS.md: - Add note about Python support in MCP server Phase 4 complete: Local Swarm now exposes OpenAI-compatible API at: - POST /v1/chat/completions (with streaming support) - GET /v1/models - GET /health Ready for use with opencode and other OpenAI-compatible clients.	2026-02-23 17:29:16 +01:00

6 Commits