feat: add --tool-port argument for tool server (default: 17616)

- Tool server now runs on port 17616 by default (separate from main API on 17615) - Add --tool-port argument to customize tool server port - Update help text to reflect default port 17616 - Prevent port conflicts when running both services on same machine
2026-02-24 14:27:40 +01:00
parent bad8732b7b
commit b5bd154ba6
2 changed files with 144 additions and 3 deletions
@@ -0,0 +1,134 @@
+# Local Swarm TODO / Future Enhancements
+
+## Context Window Optimization (For Long Context 30K+)
+
+Based on docs/CONTEXT.md, implement context compression for memory-constrained setups:
+
+### Option 2: Context Compression (Recommended for 16GB VRAM)
+
+**Stage 1: Compression Swarm (3-5 workers)**
+- Split 60K input into 6x 10K chunks
+- Each worker summarizes one chunk
+- Aggregate summaries into 8K compressed context
+- Added latency: ~2-3 seconds
+
+**Stage 2: Solution Swarm (N workers)**
+- Each worker gets 8K compressed + 2K relevant original
+- Generate solutions independently
+- Vote on best response
+
+**Benefits:**
+- Works with standard 8K models
+- Maintains swarm consensus architecture
+- 2-3x more workers possible
+
+**Implementation:**
+```python
+# New: CompressionEngine class
+class CompressionEngine:
+    def compress(self, text: str, target_tokens: int) -> str:
+        # Split into chunks
+        # Parallel summarization
+        # Aggregate results
+        pass
+```
+
+### Option 3: Hierarchical RAG (For 100K+ contexts)
+
+**Tier 1: Indexing**
+- Embed context into vector database
+- Build searchable knowledge graph
+
+**Tier 2: Retrieval + Generation**
+- Query index for relevant context
+- Each worker gets ~6K retrieved + 2K raw
+
+**Tier 3: Voting**
+- Rerank and consensus
+
+**Use case:** Codebase-wide analysis, large document processing
+
+---
+
+## Tool Execution Enhancements
+
+### Streaming Tool Results
+- Stream long file reads progressively
+- Show bash command output in real-time
+- Progress indicators for large operations
+
+### Tool Permissions
+- Configurable permission levels per tool
+- Approval required for destructive operations (rm, overwrite)
+- Audit log of all tool executions
+
+### Tool Result Caching
+- Cache file reads (hash-based)
+- Invalidate on file modification
+- Reduce redundant disk I/O
+
+---
+
+## Federation Improvements
+
+### Automatic Peer Discovery
+- Better mDNS reliability
+- Fallback to broadcast/multicast
+- Manual peer list persistence
+
+### Load Balancing
+- Distribute requests across peers based on:
+  - Current load (active workers)
+  - Latency (response time)
+  - Capability (model quality)
+
+### Fault Tolerance
+- Automatic peer failover
+- Retry with different peers
+- Degraded mode (fewer voters)
+
+---
+
+## UI/UX Enhancements
+
+### Web Dashboard
+- Real-time worker status visualization
+- Generation progress bars
+- Tool execution log viewer
+- Configuration management UI
+
+### Better Error Messages
+- Clear explanations of OOM errors
+- Suggested configurations based on hardware
+- Model compatibility checker
+
+---
+
+## Performance Optimizations
+
+### Speculative Decoding
+- Small draft model generates tokens
+- Large model verifies (2-3x speedup)
+- Requires draft model download
+
+### KV Cache Optimization
+- PagedAttention (vLLM-style)
+- Memory-efficient attention states
+- Better long-context performance
+
+### Model Quantization
+- Support for GPTQ/AWQ quantization
+- 2-3x smaller models with minimal quality loss
+- Enable larger models on same hardware
+
+---
+
+## Completed ✓
+
+- [x] Tool execution architecture (local + remote)
+- [x] Simplified tool instructions (300 tokens vs 40k)
+- [x] Federation with peer discovery
+- [x] Hardware auto-detection
+- [x] MLX backend for Apple Silicon
+- [x] Consensus voting strategies
+- [x] Model auto-selection based on VRAM
@@ -197,11 +197,17 @@ Examples:
        action="store_true",
        help="Run as dedicated tool execution server (executes read/write/bash tools)"
    )
+    parser.add_argument(
+        "--tool-port",
+        type=int,
+        default=17616,
+        help="Port for tool execution server (default: 17616)"
+    )
    parser.add_argument(
        "--tool-host",
        type=str,
        default=None,
-        help="URL of tool execution server (e.g., http://192.168.1.10:17616). Tools will be executed remotely."
+        help="URL of tool execution server (default: http://<local-ip>:17616). Tools will be executed remotely."
    )
    parser.add_argument(
        "--version",
@@ -250,13 +256,14 @@ Examples:
            return {"status": "healthy", "mode": "tool-server"}
        
        host = args.host if args.host else get_local_ip()
-        print(f"🔗 Tool server running at http://{host}:{args.port}")
+        tool_port = args.tool_port
+        print(f"🔗 Tool server running at http://{host}:{tool_port}")
        print(f"   Endpoints:")
        print(f"   - POST /v1/tools/execute")
        print(f"   - GET  /health")
        print(f"\n✅ Tool server ready!")
        
-        uvicorn.run(app, host=host, port=args.port)
+        uvicorn.run(app, host=host, port=tool_port)
        return
    
    # Determine model configuration