Commit Graph

6 Commits

Author SHA1 Message Date
sleepy d30eedaa63 Fix opencode integration: streaming, response format, and tool handling
- Fix streaming to work even when tools are present (was forcing JSON mode)
- Fix response format: use empty list [] instead of null for tool_calls
- Add exclude_none config to ChatMessage model to match OpenAI format
- Remove tool instructions from prompt (were confusing 3B model)
- Fix tool call parsing to handle markdown code blocks properly
- Change default instances from 3 to 1 for faster debugging
- Allow 1 instance minimum in interactive config (was 2 on Mac)
- Add debug logging to track requests and responses

Fixes infinite loop issue where opencode would retry requests repeatedly
2026-02-24 03:44:46 +01:00
sleepy 93f5788d74 feat: Add tool calling support to API
- Add Tool, ToolCall, FunctionDefinition models
- Format prompts with tool descriptions for Qwen models
- Parse tool calls from model output (JSON and function call patterns)
- Auto-disable streaming when tools are present
- Return tool_calls in API response with proper finish_reason
- Support both simple function calls and JSON tool_calls format
2026-02-23 23:08:47 +01:00
sleepy 472961cc23 feat: Apple Silicon MLX support, sequential workers, live status display, worker names
Major improvements for macOS/Apple Silicon:
- Add spawn-based multiprocessing for Metal GPU compatibility
- Implement sequential generation mode for multiple workers
- Each worker runs one-at-a-time to avoid GPU conflicts
- All workers stay loaded in memory for fast switching

User Experience:
- 100 unique worker names (Alpha, Raven, Zeus, etc.)
- Live terminal status display with progress bars
- Show context usage and last output per worker
- Display IP addresses for network workers

Configuration:
- Default port changed to 17615 (from 8000)
- Context size options: 16K, 32K (default), 64K, 128K
- Offloading options: none, 20%, 50%
- Default max_tokens: 1024

MLX Quantization Support:
- Support 3bit, 4bit, 5bit, 6bit, 8bit MLX models
- Proper memory calculations for each quantization
- Sequential mode automatically enabled on Apple Silicon

Bug Fixes:
- Fix instance calculation (was always returning 1)
- Fix quantization bit detection for MLX models
- Fix config.json generation in model folders
- Preload MiniLM embedding model during init

Files Changed:
- main.py: Spawn method for macOS, port 17615
- src/backends/mlx.py: MLX generation with stop sequences
- src/models/selector.py: Fix instance calculation
- src/swarm/manager.py: Sequential generation mode
- src/swarm/consensus.py: Preload embedding model
- src/swarm/worker.py: Progress tracking per worker
- src/swarm/worker_names.py: 100 unique names (NEW)
- src/swarm/status_monitor.py: Live display (NEW)
- src/interactive.py: Context/offload menus
- src/models/registry.py: MLX quantization sizes
- src/api/server.py: Port 17615, live status
2026-02-23 22:57:38 +01:00
sleepy e794fe29d4 Fix critical bugs, concurrency issues, and code quality across codebase
- Fix asyncio.create_task() crash in zeroconf background thread (discovery.py)
- Fix int(bytes) TypeError in peer property decoding (discovery.py)
- Fix unreachable Android/Qualcomm GPU detection path (detector.py)
- Add nvmlShutdown() to prevent NVML resource leak (detector.py)
- Wrap blocking inference in asyncio.to_thread() to unblock event loop (llamacpp.py, mlx.py)
- Initialize and use asyncio.Lock for concurrency safety (llamacpp.py)
- Fix VRAM regex matching GPU index instead of byte value (amd.py)
- Implement best_of_n federation strategy (was dead code) (federation.py)
- Lazy-import aiohttp/mcp to avoid hard ImportError (federation.py, mcp_server.py)
- Fix response_model conflict with streaming responses (routes.py)
- Fix CORS allow_origins=* with allow_credentials=True violation (server.py)
- Fix memory calculation using pre-clamped instance count (selector.py)
- Fix calculate_max_instances returning 2 when only 0-1 fit (selector.py)
- Atomic downloads via .part file to prevent caching partial files (downloader.py)
- Replace recursive menu navigation with loop-based approach (interactive.py)
- Implement actual majority voting in _majority_vote (consensus.py)
- Fix false-positive list detection in quality scoring (consensus.py)
- Replace 15+ bare except: with except Exception: across codebase
- Fix .json() -> .model_dump_json() for Pydantic v2 (routes.py)
- Remove unused MCP imports, add empty prompt validation (mcp_server.py)
- Use tokenizer for accurate MLX token counting (mlx.py)
- Fix memory estimate from FP32 (*4) to quantized (*0.6) (llamacpp.py)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 20:11:58 +01:00
sleepy 2f547fe101 Phase 6: Network Federation (#1)
* Update PLAN.md with new phases

- Add Phase 5: CLI & Interactive Interface
  - Interactive menu system with 3 options
  - Hardware display with detailed specs
  - Resource usage monitoring
  - Custom configuration wizard

- Add Phase 5.5: MCP Server
  - MCP protocol implementation
  - 5 MCP tools for AI assistants
  - Dual server mode (HTTP + MCP)

- Reorganize phase structure for clarity

* Phase 6: Implement network federation (WIP)

Add src/network/discovery.py:
- SwarmDiscovery class using mDNS/Bonjour
- PeerInfo dataclass for peer metadata
- Automatic peer discovery on local network
- Service advertising for this swarm
- Stale peer detection and cleanup

Add src/network/federation.py:
- FederationClient for HTTP communication with peers
- FederatedSwarm for managing cross-swarm consensus
- Two-phase voting: local consensus then peer voting
- Weighted voting strategy based on confidence
- Federation status monitoring
- Peer health checking

Add src/network/__init__.py:
- Export network classes

Update src/api/routes.py:
- POST /v1/federation/vote - Receive votes from peers
- GET /v1/federation/status - Get federation status
- GET /v1/federation/peers - List discovered peers

Update requirements.txt:
- Add zeroconf for mDNS discovery

Features:
- Auto-discovery of other Local Swarm instances
- Cross-swarm consensus voting
- Configurable minimum peer requirements
- Fallback to local-only if no peers available
- Peer health monitoring

TODO:
- Integrate federation into main.py
- Add --federation flag
- Test multi-machine setup
2026-02-23 18:05:27 +01:00
sleepy 4367c79d83 Phase 4: Implement OpenAI-compatible API server
Add src/api/models.py:
- Pydantic models for OpenAI API compatibility
- ChatCompletionRequest/Response models
- Streaming response models (SSE format)
- Model listing and health check models

Add src/api/routes.py:
- POST /v1/chat/completions endpoint
- GET /v1/models endpoint
- GET /health and /v1/health endpoints
- Support for streaming (text/event-stream) and regular responses
- Message formatting for chat prompts
- Error handling with proper HTTP status codes

Add src/api/server.py:
- FastAPI application with CORS middleware
- Lifespan context for startup/shutdown
- Integration with SwarmManager
- Uvicorn server configuration

Update src/api/__init__.py:
- Export API classes and functions

Update main.py:
- Integrate API server into default workflow
- Start API server on http://127.0.0.1:PORT
- Show API endpoints and opencode configuration
- Graceful shutdown on Ctrl+C

Update AGENTS.md:
- Add note about Python support in MCP server

Phase 4 complete: Local Swarm now exposes OpenAI-compatible API at:
- POST /v1/chat/completions (with streaming support)
- GET /v1/models
- GET /health

Ready for use with opencode and other OpenAI-compatible clients.
2026-02-23 17:29:16 +01:00