local_swarm

Author	SHA1	Message	Date
sleepy	dcca89d89a	fix: OpenAI API compatibility for hollama and other clients - Fixed ChatMessage.tool_calls to be Optional with default None (excluded when empty) - Added logprobs field to ChatCompletionChoice (always included as null) - Added stats and system_fingerprint to ChatCompletionResponse - Fixed streaming response to use delta format (not message format) - Fixed non-streaming response to include logprobs: null - Updated tool instructions to include 'NO explanations' - Added pytest-asyncio markers to async tests - All 41 tests passing This fixes the 'Cannot read properties of undefined (reading content)' error in hollama and ensures compatibility with OpenAI clients.	2026-02-25 19:39:05 +01:00
sleepy	1acebbc6a2	refactor(models): extract memory calculations and config from selector Changes: - selector.py: 486 → 329 lines (-32%) - Extracted memory calculation functions to memory_calculator.py - Extracted constants to selector_config.json - Updated selector.py to load config and import from memory_calculator - All 35 tests pass	2026-02-25 13:23:47 +01:00
sleepy	0886b9ae73	fix: handle --model with full format (model:size:quant) - Parse model ID with format like qwen2.5-coder:7b:4bit - Return specific error if requested config not found or doesn't fit - Don't fall back to auto-selection when specific config requested	2026-02-24 13:15:59 +01:00
sleepy	d30eedaa63	Fix opencode integration: streaming, response format, and tool handling - Fix streaming to work even when tools are present (was forcing JSON mode) - Fix response format: use empty list [] instead of null for tool_calls - Add exclude_none config to ChatMessage model to match OpenAI format - Remove tool instructions from prompt (were confusing 3B model) - Fix tool call parsing to handle markdown code blocks properly - Change default instances from 3 to 1 for faster debugging - Allow 1 instance minimum in interactive config (was 2 on Mac) - Add debug logging to track requests and responses Fixes infinite loop issue where opencode would retry requests repeatedly	2026-02-24 03:44:46 +01:00
sleepy	2461f45ca8	fix: Remove slow HF API check from recommended config selection - select_optimal_model was checking HF API for available quantizations - This caused menu to hang/slow down when changing context - Now only checks availability when browsing or custom config - Recommended config uses default quantizations (faster)	2026-02-23 23:54:57 +01:00
sleepy	f2d0fddfa4	fix: Update selector to check available quantizations on Mac	2026-02-23 23:52:29 +01:00
sleepy	792c40594e	fix: Recommended config shows 3 responses on Mac instead of 1 Updated _try_model_with_context and _try_smallest_variant_with_context: - On Mac (use_mlx=True): Returns 3 responses by default - On other platforms: Still calculates based on VRAM - Memory calculation fixed for Mac (doesn't multiply by response count) Fixes issue where recommended config showed 'Responses: 1' on Mac	2026-02-23 23:46:01 +01:00
sleepy	472961cc23	feat: Apple Silicon MLX support, sequential workers, live status display, worker names Major improvements for macOS/Apple Silicon: - Add spawn-based multiprocessing for Metal GPU compatibility - Implement sequential generation mode for multiple workers - Each worker runs one-at-a-time to avoid GPU conflicts - All workers stay loaded in memory for fast switching User Experience: - 100 unique worker names (Alpha, Raven, Zeus, etc.) - Live terminal status display with progress bars - Show context usage and last output per worker - Display IP addresses for network workers Configuration: - Default port changed to 17615 (from 8000) - Context size options: 16K, 32K (default), 64K, 128K - Offloading options: none, 20%, 50% - Default max_tokens: 1024 MLX Quantization Support: - Support 3bit, 4bit, 5bit, 6bit, 8bit MLX models - Proper memory calculations for each quantization - Sequential mode automatically enabled on Apple Silicon Bug Fixes: - Fix instance calculation (was always returning 1) - Fix quantization bit detection for MLX models - Fix config.json generation in model folders - Preload MiniLM embedding model during init Files Changed: - main.py: Spawn method for macOS, port 17615 - src/backends/mlx.py: MLX generation with stop sequences - src/models/selector.py: Fix instance calculation - src/swarm/manager.py: Sequential generation mode - src/swarm/consensus.py: Preload embedding model - src/swarm/worker.py: Progress tracking per worker - src/swarm/worker_names.py: 100 unique names (NEW) - src/swarm/status_monitor.py: Live display (NEW) - src/interactive.py: Context/offload menus - src/models/registry.py: MLX quantization sizes - src/api/server.py: Port 17615, live status	2026-02-23 22:57:38 +01:00
sleepy	e794fe29d4	Fix critical bugs, concurrency issues, and code quality across codebase - Fix asyncio.create_task() crash in zeroconf background thread (discovery.py) - Fix int(bytes) TypeError in peer property decoding (discovery.py) - Fix unreachable Android/Qualcomm GPU detection path (detector.py) - Add nvmlShutdown() to prevent NVML resource leak (detector.py) - Wrap blocking inference in asyncio.to_thread() to unblock event loop (llamacpp.py, mlx.py) - Initialize and use asyncio.Lock for concurrency safety (llamacpp.py) - Fix VRAM regex matching GPU index instead of byte value (amd.py) - Implement best_of_n federation strategy (was dead code) (federation.py) - Lazy-import aiohttp/mcp to avoid hard ImportError (federation.py, mcp_server.py) - Fix response_model conflict with streaming responses (routes.py) - Fix CORS allow_origins=* with allow_credentials=True violation (server.py) - Fix memory calculation using pre-clamped instance count (selector.py) - Fix calculate_max_instances returning 2 when only 0-1 fit (selector.py) - Atomic downloads via .part file to prevent caching partial files (downloader.py) - Replace recursive menu navigation with loop-based approach (interactive.py) - Implement actual majority voting in _majority_vote (consensus.py) - Fix false-positive list detection in quality scoring (consensus.py) - Replace 15+ bare except: with except Exception: across codebase - Fix .json() -> .model_dump_json() for Pydantic v2 (routes.py) - Remove unused MCP imports, add empty prompt validation (mcp_server.py) - Use tokenizer for accurate MLX token counting (mlx.py) - Fix memory estimate from FP32 (4) to quantized (0.6) (llamacpp.py) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 20:11:58 +01:00
sleepy	d68eda45d8	Fix .gitignore to allow src/models/ directory The .gitignore had 'models/' which excluded both: - The models/ cache directory at root (intended) - The src/models/ module directory (NOT intended) Changed to '/models/' to only exclude root-level models/ directory while allowing src/models/ to be tracked. This fixes the 'No module named models' error on fresh clones.	2026-02-23 19:51:40 +01:00

10 Commits