feat: add --tool-port argument for tool server (default: 17616)

- Tool server now runs on port 17616 by default (separate from main API on 17615)
- Add --tool-port argument to customize tool server port
- Update help text to reflect default port 17616
- Prevent port conflicts when running both services on same machine
This commit is contained in:
2026-02-24 14:27:40 +01:00
parent bad8732b7b
commit b5bd154ba6
2 changed files with 144 additions and 3 deletions
+134
View File
@@ -0,0 +1,134 @@
# Local Swarm TODO / Future Enhancements
## Context Window Optimization (For Long Context 30K+)
Based on docs/CONTEXT.md, implement context compression for memory-constrained setups:
### Option 2: Context Compression (Recommended for 16GB VRAM)
**Stage 1: Compression Swarm (3-5 workers)**
- Split 60K input into 6x 10K chunks
- Each worker summarizes one chunk
- Aggregate summaries into 8K compressed context
- Added latency: ~2-3 seconds
**Stage 2: Solution Swarm (N workers)**
- Each worker gets 8K compressed + 2K relevant original
- Generate solutions independently
- Vote on best response
**Benefits:**
- Works with standard 8K models
- Maintains swarm consensus architecture
- 2-3x more workers possible
**Implementation:**
```python
# New: CompressionEngine class
class CompressionEngine:
def compress(self, text: str, target_tokens: int) -> str:
# Split into chunks
# Parallel summarization
# Aggregate results
pass
```
### Option 3: Hierarchical RAG (For 100K+ contexts)
**Tier 1: Indexing**
- Embed context into vector database
- Build searchable knowledge graph
**Tier 2: Retrieval + Generation**
- Query index for relevant context
- Each worker gets ~6K retrieved + 2K raw
**Tier 3: Voting**
- Rerank and consensus
**Use case:** Codebase-wide analysis, large document processing
---
## Tool Execution Enhancements
### Streaming Tool Results
- Stream long file reads progressively
- Show bash command output in real-time
- Progress indicators for large operations
### Tool Permissions
- Configurable permission levels per tool
- Approval required for destructive operations (rm, overwrite)
- Audit log of all tool executions
### Tool Result Caching
- Cache file reads (hash-based)
- Invalidate on file modification
- Reduce redundant disk I/O
---
## Federation Improvements
### Automatic Peer Discovery
- Better mDNS reliability
- Fallback to broadcast/multicast
- Manual peer list persistence
### Load Balancing
- Distribute requests across peers based on:
- Current load (active workers)
- Latency (response time)
- Capability (model quality)
### Fault Tolerance
- Automatic peer failover
- Retry with different peers
- Degraded mode (fewer voters)
---
## UI/UX Enhancements
### Web Dashboard
- Real-time worker status visualization
- Generation progress bars
- Tool execution log viewer
- Configuration management UI
### Better Error Messages
- Clear explanations of OOM errors
- Suggested configurations based on hardware
- Model compatibility checker
---
## Performance Optimizations
### Speculative Decoding
- Small draft model generates tokens
- Large model verifies (2-3x speedup)
- Requires draft model download
### KV Cache Optimization
- PagedAttention (vLLM-style)
- Memory-efficient attention states
- Better long-context performance
### Model Quantization
- Support for GPTQ/AWQ quantization
- 2-3x smaller models with minimal quality loss
- Enable larger models on same hardware
---
## Completed ✓
- [x] Tool execution architecture (local + remote)
- [x] Simplified tool instructions (300 tokens vs 40k)
- [x] Federation with peer discovery
- [x] Hardware auto-detection
- [x] MLX backend for Apple Silicon
- [x] Consensus voting strategies
- [x] Model auto-selection based on VRAM
+10 -3
View File
@@ -197,11 +197,17 @@ Examples:
action="store_true",
help="Run as dedicated tool execution server (executes read/write/bash tools)"
)
parser.add_argument(
"--tool-port",
type=int,
default=17616,
help="Port for tool execution server (default: 17616)"
)
parser.add_argument(
"--tool-host",
type=str,
default=None,
help="URL of tool execution server (e.g., http://192.168.1.10:17616). Tools will be executed remotely."
help="URL of tool execution server (default: http://<local-ip>:17616). Tools will be executed remotely."
)
parser.add_argument(
"--version",
@@ -250,13 +256,14 @@ Examples:
return {"status": "healthy", "mode": "tool-server"}
host = args.host if args.host else get_local_ip()
print(f"🔗 Tool server running at http://{host}:{args.port}")
tool_port = args.tool_port
print(f"🔗 Tool server running at http://{host}:{tool_port}")
print(f" Endpoints:")
print(f" - POST /v1/tools/execute")
print(f" - GET /health")
print(f"\n✅ Tool server ready!")
uvicorn.run(app, host=host, port=args.port)
uvicorn.run(app, host=host, port=tool_port)
return
# Determine model configuration