local_swarm

Author	SHA1	Message	Date
sleepy	d22c52ec04	docs: Add minimal, maintainable, modular code requirements - AGENT_WORKER.md: Added Rule 3 for minimal, maintainable, modular code - AGENT_REVIEW.md: Added strict enforcement check in Phase 2 - Emphasizes single responsibility, clean interfaces, and production quality - Reviewers must block code that doesn't meet these standards	2026-02-25 12:30:18 +01:00
sleepy	5fa8cd4e0e	fix: Correct streaming implementation syntax - Fixed indentation in routes.py streaming code - Real-time streaming now properly structured - All syntax errors resolved	2026-02-25 12:25:19 +01:00
sleepy	2c46d48004	feat: Add real-time streaming for tools Streams assistant's thinking and tool calls back to opencode immediately: - Sends content chunks as they're generated - Parses and sends tool_calls deltas incrementally - Doesn't execute tools server-side - Allows opencode to show progress during generation Note: Real implementation requires fixing syntax errors in routes.py	2026-02-25 12:10:49 +01:00
sleepy	0945cee162	feat: Add webfetch tool support - Add _execute_webfetch method to ToolExecutor - Add webfetch to _execute_local tool list - Update tool_instructions.txt to include webfetch - Supports text/markdown/html formats - 30s timeout for web requests - Import asyncio for async HTTP handling	2026-02-25 12:02:36 +01:00
sleepy	58e4b2c645	feat: Add tokens/sec tracking to streaming output - Track timing during streaming to calculate t/s - Estimate tokens from characters (4 chars/token) - Display t/s in stream completion message - Remove debug logging from worker	2026-02-25 11:55:27 +01:00
sleepy	929f069d14	Add debug logging to trace prompt sizes in worker	2026-02-25 11:54:57 +01:00
sleepy	bdcb013d6b	feat: Aggressive token compression for initial opencode requests - Detect initial requests (no assistant/tool messages) - If >4000 tokens, compress aggressively: - Keep only user messages - Truncate to 2000 chars if needed - Replace huge system prompts with minimal instructions - Log compression stats (original vs final token count) - Maintains tool functionality while saving ~28k tokens This allows 16k context models to work with opencode without overflow.	2026-02-25 11:51:24 +01:00
sleepy	9fdc3a6d02	docs: Update README with --use-opencode-tools flag documentation Add documentation for the new tool mode options: - Default local tool server mode (~125 tokens) - Optional --use-opencode-tools flag (~27k tokens) Helps users understand the token trade-off between modes.	2026-02-25 11:35:00 +01:00
sleepy	c18c20487c	feat: Add configurable tool mode to save tokens - Add --use-opencode-tools flag to main.py - Default: local tool server mode (~125 tokens, saves ~27k tokens) - Optional: opencode tools mode (~27k tokens, full tool definitions) - Create .opencodeignore to exclude large docs from context - Update design doc with token bloat analysis This allows users to choose between: - Local tool server: Minimal tool instructions, saves 27k tokens - Opencode tools: Full tool definitions, more robust but expensive	2026-02-25 11:31:48 +01:00
sleepy	1d1d7b4468	feat(server): Disable access logs to reduce noise Changed uvicorn log_level from info to warning and disabled access_log to suppress the flood of GET /health requests from federation peers.	2026-02-25 03:08:43 +01:00
sleepy	4f2b9252c4	fix(status_monitor): Stop spamming 'Workers Idle' messages The status monitor was printing 'Workers Idle' every 2 seconds even when nothing changed. This caused terminal spam and conflicted with mDNS logs. Now it only shows status when workers are actually generating, and clears the display when they become idle.	2026-02-25 02:39:09 +01:00
sleepy	3dbc76de04	fix(registry): Update MLX model registry with verified HuggingFace repositories - Fix DeepSeek Coder: Only 4bit available, 1.3b has no quantizations - Fix CodeLlama: Use correct 'hf-{quant}bit-mlx' suffix naming - Fix StarCoder2: 3b/7b only have 4bit, 15b has 4bit/8bit - Add DeepSeek Coder V2 Lite: New model with 4/6/8bit support - Update repository naming for all MLX models to match actual HF repos Verified against HuggingFace mlx-community organization (2025-02-25)	2026-02-25 02:34:34 +01:00
sleepy	af2d616f76	fix: Add verbose mDNS logging and diagnostics endpoint - Add detailed logging for mDNS service discovery - Log when services are added/removed - Add diagnostics endpoint at /v1/federation/diagnostics - Better visibility into why peers aren't discovered - Keep IP binding to 192.168.x.x as requested	2026-02-25 01:51:59 +01:00
sleepy	1ac32c7ec3	feat: Add global tokens/sec reporting and reduce log level to INFO - Add global t/sec metric that includes sync + voting overhead - Track total time from start to finish across all workers - Display global performance summary after federation completes - Reduce default logging level from DEBUG to INFO - Add tokens_generated to federation API responses - Update federation vote to report peer t/sec metrics This allows users to see both individual worker speeds and the effective speed including synchronization overhead.	2026-02-25 01:44:15 +01:00
sleepy	d33fa406b6	feat: CUDA/Android support and federation metrics (#7 ) * optimize(federation): run local and peer generation in parallel Previously, the federation waited for local generation to complete before asking peers to generate. This wasted time since peers sat idle while the host generated. Now local swarm and all peers generate simultaneously: - Fire local generation AND peer requests at the same time - Wait for all to complete with asyncio.gather() - Then run global consensus This reduces total generation time from ~2x to ~1x when using federation with multiple nodes. Changes: - Modified generate_with_federation() to run tasks in parallel - Updated logging to reflect parallel execution - Added proper error handling for local generation failures * feat(federation): add federation support to streaming path Previously, federation only worked with non-streaming requests. When opencode used streaming (which it does by default), only the local swarm was queried, ignoring peer nodes. Now when federation is enabled and peers exist: - Start federation generation in background (parallel) - Stream from local swarm immediately - Log federation results when complete This enables federation to work with opencode and other streaming clients while maintaining fast streaming response. Also added webfetch instructions to prevent hallucinating URLs. Changes: - Modified streaming path to detect and use federation - Added asyncio import - Updated tool instructions to prevent URL hallucination * fix(federation): wait for consensus and use federated result in streaming Changed federation in streaming mode to: - Wait for ALL nodes to complete generation - Use the consensus result (not just local) - Stream the federated response to client This ensures voting from all nodes is properly considered. Previous implementation streamed locally while federation ran in background for logging only, which ignored the consensus. * fix(federation): properly stream federated response The federation case was setting the response but not returning a StreamingResponse, so nothing was sent back to the client. Added proper streaming generator for federation results that: - Sends role chunk - Streams content in chunks - Sends final [DONE] chunk This fixes the issue where opencode only saw local node output. * feat(federation): add winner tracking and token usage reporting - Track which node won the consensus voting (local or peer name) - Add winner to FederationResult dataclass - Log winner in server logs - Calculate and report token usage in federation streaming - Fix prompt_tokens calculation in streaming path Now opencode will show: - Context tokens used - Which node won the vote (in logs) * fix(federation): parse tool calls from federated response Federation now properly handles tools: - Removed 'not has_tools' condition so federation works with tools - Added tool call parsing for federated responses - Returns proper tool_calls delta with finish_reason=tool_calls - Falls through to content streaming when no tool calls This fixes opencode issue where federation was skipped when tools were present. * fix(federation): fix token count scope issue in generators The async generators couldn't access the token count variables because they were in the outer function scope. Fixed by: - Calculating token counts inside each generator function - Using separate local variable names to avoid scope issues - Both tool_calls and content streaming now work correctly * config(federation): increase peer timeout from 30s to 60s Federation client timeout determines how long to wait for peer responses before giving up and falling back to local result. Changed from 30s to 60s to give peers more time to respond especially on slower networks or machines. * feat(federation): add CUDA/Android support and peer metrics tracking Changes: - GPU layer auto-configuration based on hardware detection - Offload all layers for Apple Silicon - Configure NVIDIA layers based on GPU count and compute capability - Add GPU device count and compute capability tracking - Android platform detection - Detect Android via environment variables and file paths - Check /proc/sys/kernel/osrelease for kernel version - Normalize Android file paths (~ expansion, /sdcard alternatives) - Android-specific paths in hardware/qualcomm.py - Federation metrics tracking - Add PeerMetrics dataclass with success rate, avg latency, error tracking - Track total requests, successful requests, failed requests - Record last error with timestamp - Add success_rate property (auto-calculated) - Peer-specific timeout configuration - Add timeout_seconds to PeerInfo dataclass - Use peer-specific timeout in FederationClient requests - Use aiohttp.ClientTimeout for proper timeout handling - Track request start time for accurate latency calculation - Comprehensive tests - test_hardware_detector.py: 14 test cases for GPU detection and Android - test_federation_metrics.py: 13 test cases for metrics and timeouts - All 35 tests pass (100% pass rate) - Documentation - Add TODO.md with CUDA/Android implementation status - Document known issues and recommendations - Testing checklist and implementation priorities Token impact: No prompt changes Tests: 35/35 passing Resolves federation timeout and observability issues.	2026-02-25 00:53:07 +01:00
sleepy	580d1e5d17	feat: comprehensive tool system improvements and webfetch support (#3 ) * feat: enhanced tool instructions for multi-step operations - Add comprehensive examples for ls, find, grep, mkdir, npm init, etc. - Explain multi-step workflow (explore → read → write) - Tool system already supports chaining via conversation history - Bash tool supports: ls, find, grep, cat, mkdir, cd, npm, etc. - 30 second timeout on commands - Output limited to 3000 chars for readability * Cleanup: Consolidate documentation and tidy codebase Documentation: - Consolidate 6 markdown files into simplified README.md - Remove redundant docs: TODO.md, NETWORK.md, REVIEW.md, PLAN.md, CONTEXT.md, GUIDE.md - Add ARCHITECTURE.md with clean technical overview - README now focuses on quick start and core concepts Code verification: - Verified blocking I/O properly wrapped in asyncio.to_thread() - Confirmed locks initialized correctly in backends - AMD VRAM detection uses proper regex (takes max value, not first match) - All exception handling uses 'except Exception:' (not bare except) Tool execution improvements (existing changes): - Better working directory handling with project root detection - Extended timeouts for package managers (300s) - Multi-tool call parsing support - Improved error handling and logging Note: System prompt concern noted - 30k tokens too large for 16-32k context windows * docs: add development patterns analysis Document circular development issues identified in commit history: - Tool execution went back-and-forth 3+ times (server-side vs client-side) - Tool instructions changed from 40k → 300 → removed → enhanced tokens - 8+ parsing fixes for same issues (no tests) - 6 debug-only commits (production debugging) Provides recommendations to prevent future cycles: 1. Pick one architecture and stick with it 2. Add unit tests before fixes 3. Token budget (<2000 for instructions) 4. One format only (remove alternative parsers) 5. Integration test script 6. Separate concerns into smaller modules 7. Design doc before code changes 8. CI/CD with automated testing * docs: add comprehensive agent guidelines AGENT_WORKER.md (600+ lines): - Pre-flight checklist: token budget, test plan, design doc - Coding rules: TDD, no debug code, architecture consistency - Git workflow: branching strategy, commit rules, release process - Testing requirements: unit (≥80%), integration structure - Code quality: PEP 8, type hints, max 50 lines per function - Architecture: no feature flags, separation of concerns - Continuous learning: research requirements, documentation - Forbidden patterns: bare except, production debugging, etc. AGENT_REVIEW.md (400+ lines): - Review philosophy: prevent circular development - 6-phase review checklist: structure, quality, tokens, architecture, research, logic - Report format with token impact analysis - Severity levels: blocking vs warnings vs approved - Common issues with examples (good vs bad) - Review workflow: 30-35 min per PR - Reports stored in reports/ folder (gitignored) Also added: - tests/test_tool_parsing.py - example test following guidelines - Updated DEVELOPMENT_PATTERNS.md with recommendations Reports folder in .gitignore for local review storage * chore: gitignore review reports folder * feat: fix tool execution and enhance instructions with accurate token counting - Enhanced tool instructions (1041 tokens, within 2000 budget) - Added tiktoken>=0.5.0 for accurate token counting - Fixed subprocess hang by adding stdin=subprocess.DEVNULL - Removed 9 DEBUG print statements from routes.py - Added tests for instruction content and token budget verification - All tests pass (11/11) Resolves blockers from previous review: - Token budget verified ✓ - Token documentation added ✓ - Debug code cleaned ✓ - Missing tests added ✓ * feat: implement comprehensive tool system with proper logging Major improvements to tool instructions and execution: - Enhanced tool instructions with 7-step task completion workflow - Added markdown code block fallback parser for tool calls - Fixed subprocess hang with stdin=subprocess.DEVNULL - Fixed streaming path to return tool_calls (enabling multi-turn conversations) - Added complete React project creation example with verification steps - Token count: 1,743 tokens (within 2,000 limit) Logging infrastructure: - Created centralized logging configuration (src/utils/logging_config.py) - Replaced 80+ print statements with logger.debug() - Set log level to DEBUG for development - All modules now use proper logging instead of print Testing: - Added 4 new tests for markdown parsing and instruction content - All 13 tests passing - Token budget verification test Documentation: - Added comprehensive design docs for all major changes - Added test plans for verification - Created helper scripts for logging migration Files changed: - main.py: Added logging setup - src/api/routes.py: Tool instructions, streaming fixes, logging - src/tools/executor.py: subprocess fix, logging - src/utils/: New logging configuration module - tests/test_tool_parsing.py: New tests - docs/: Design decisions and test plans - scripts/: Helper scripts for development * refactor: simplify tool instructions to 109 tokens for 7B model Reduced from 1,743 tokens to 109 tokens (94% reduction) to help qwen2.5 7B 4bit model follow instructions better. Changes: - Removed complex workflow documentation - Removed multi-turn conversation examples - Removed lengthy anti-patterns - Kept only essential format and rules - Updated tests to match simplified content Before: 1,743 tokens, 6,004 chars (87% of budget) After: 109 tokens, 392 chars (5.5% of budget) This should make it much easier for smaller models to: 1. Understand they must use tools 2. Follow the simple TOOL: format 3. Not get overwhelmed by instructions * refactor: make tool instructions ultra-direct for 7B models Further simplify instructions to prevent model from adding explanations. Before: 109 tokens - model still added explanatory text After: 86 tokens - ultra-direct commands Key changes: - Start with 'You MUST use tools. DO NOT explain.' - 'OUTPUT THIS EXACT FORMAT - NOTHING ELSE' - Removed all examples and pleasantries - Added 'NEVER' rules in all caps - 'ONLY output TOOL: lines' The model was outputting: '1. First, install... TOOL: bash ARGUMENTS: {...}' Now should output just: 'TOOL: bash ARGUMENTS: {...}' This should force the 7B qwen model to stop explaining and just execute. * refactor: move tool instructions to external config file Moves hardcoded tool instructions from routes.py to external config file for better maintainability and easier editing. Changes: - Created config/prompts/tool_instructions.txt - Added _load_tool_instructions() function with caching - Falls back to default if config file not found - Updated tests to use the loader function - Added proper error handling Benefits: - Easier to modify instructions without code changes - Instructions can be edited by non-developers - Cleaner separation of config vs code - Supports hot-reloading (cached but easy to invalidate) Token count: 86 tokens (loaded from file) Location: config/prompts/tool_instructions.txt * refactor: simplify tool instructions further and add debug logging - Reduced instructions to bare minimum: 50 tokens - Added debug logging to verify instructions are sent - Removed all caps and aggressive language - Made instructions more straightforward Instructions now: 'Use tools to execute commands. Output only tool calls. Format: TOOL: bash ARGUMENTS: {...} No explanations. No numbered lists. No markdown. Only tool calls.' This should be easier for 7B models to follow while still conveying the essential requirements. * feat: improve tool parser to handle 7B model output variations Enhanced parse_tool_calls() with multiple fallback strategies: 1. Standard TOOL:/ARGUMENTS: format (original) 2. Markdown code blocks () 3. Numbered list items (1. npm install ...) 4. Standalone bash commands (npm, npx, mkdir, etc.) Now handles messy output from small models like: '1. Install: npm install -g create-react-app' '2. Create: create-react-app hello-world' Parses these into chained bash commands for execution. Also simplified instructions to 50 tokens minimum: 'Use tools to execute commands. Output only tool calls. Format: TOOL: bash ARGUMENTS: {...} No explanations. No numbered lists. No markdown. Only tool calls.' This combination should make 7B models much more likely to have their output successfully parsed and executed. * fix: improve command extraction for 7B model output Parser now extracts bash commands from any line containing: - npm, npx, mkdir, cd, ls, cat, echo, git, python, pip, node, yarn - create-react-app (added for React projects) Example: Extracts 'npm install -g create-react-app' from: '1. Install: npm install -g create-react-app' Chains multiple commands with && for sequential execution. This should now successfully parse the numbered list output from 7B models and execute the commands. * feat: add bash tool description validation and improve 7B model parsing Changes: - Added _ensure_tool_arguments() function to inject 'description' field - Updated tool_instructions.txt to require description for bash tool - Improved 7B model command extraction with better regex patterns - Added 'create-react-app' to command detection list - Updated delta field type to Dict[str, Any] for streaming - Added GGUF to MLX quantization mapping for registry.py - Clarified agent responsibilities in AGENT_REVIEW.md and AGENT_WORKER.md Fixes: - Bash tool now validates required 'description' field - 7B model output parsed more reliably (numbered lists) - Multiple commands chained with && for sequential execution Token count: 69 tokens (down from 86, -19.8%) All tests pass: 13/13 * feat: add webfetch tool support with URL extraction Changes: - Added webfetch to tool instructions config - Added URL extraction pattern to parse_tool_calls() - Parser now recognizes URLs and creates webfetch tool calls - Updated token count: 89 tokens (+29% from 69) The webfetch tool is available through opencode environment. System prompt adjustment enables model to use it for URL fetching. Token budget: 89 tokens (4.45% of 2000 limit) Tests pass: 13/13	2026-02-24 22:35:05 +01:00
sleepy	40fe75c738	fix: return streaming format (SSE) for tool execution results When tools are executed during a streaming request, return the results as a proper SSE stream instead of non-streaming JSON. This ensures opencode receives the response in the expected format. - Stream tool results in chunks - Include proper SSE format with data: prefix - End with [DONE] marker	2026-02-24 15:16:12 +01:00
sleepy	539ca21d51	feat: simplify tool format to TOOL:/ARGUMENTS: pattern Replace complex OpenAI-style JSON format with simple format: TOOL: tool_name ARGUMENTS: {param: value} This matches what the tool server expects and is much easier for smaller models to generate correctly. Also add parser for this format with priority over other formats.	2026-02-24 15:09:47 +01:00
sleepy	61ffd1c925	fix: add tool execution to streaming path When streaming is enabled but tools are present: 1. Collect the full response (don't stream yet) 2. Parse for tool calls 3. Execute tools via tool executor 4. Return the tool results as a non-streaming response This fixes the issue where streaming requests with tools were bypassing tool execution entirely.	2026-02-24 15:07:04 +01:00
sleepy	e0c500e62b	debug: add very visible request/response logging Add prominent ===== lines around request and response logging to make it easier to see what's happening in the LLM server terminal.	2026-02-24 15:03:38 +01:00
sleepy	aa137b685b	fix: handle tool_calls as single object or array - Parse tool_calls whether it's a single object {...} or array [...] - Normalize to list for consistent processing - Add debug logging to trace tool execution flow - Fix variable name (value_str instead of array_str)	2026-02-24 14:59:35 +01:00
sleepy	f83e6fc711	fix: execute tools via tool executor instead of returning tool_calls - Execute tools server-side using configured tool executor (local or remote) - Return tool results as content directly - Add logging to show which tool executor is being used - This should make tool execution work with opencode's broken tool support	2026-02-24 14:48:13 +01:00
sleepy	b7fc184ca3	fix: return tool_calls to opencode instead of executing server-side Revert to proper OpenAI tool flow: 1. LLM Server returns tool_calls with finish_reason='tool_calls' 2. opencode executes tools (can use tool-server if configured) 3. opencode sends tool result back to LLM Server 4. LLM Server generates final response This allows opencode to handle tool execution and retry logic.	2026-02-24 14:42:08 +01:00
sleepy	25b675cff5	debug: add explicit logging for tool executor configuration - Log when no tool executor is configured (fallback to local) - Log whether using remote tool host or local execution - Help diagnose why tool requests aren't reaching the tool server	2026-02-24 14:39:50 +01:00
sleepy	9932e34385	feat: add tool execution logging and status display - Tool server now logs when tools are executed: '🔧 TOOL SERVER: Executing read({...})' '🔧 TOOL SERVER: read completed (500 chars)' - LLM server logs remote tool calls: ' 🔧 Remote tool call: read({...})' ' ✅ Tool result received (500 chars)' - Startup now shows tool server status: 🔧 Tool Server: Remote URL: http://192.168.1.5:17616 (auto-detected) Mode: Tools executed remotely on tool host OR: 🔧 Tool Server: Local Mode: Tools executed on this machine	2026-02-24 14:34:58 +01:00
sleepy	cc66c550e4	feat: --tool-host auto-detects local IP when used without value - --tool-host with no value: auto-detects local IP (e.g., http://192.168.1.5:17616) - --tool-host with explicit URL: uses provided URL - No --tool-host: local tool execution (default) Example usage: python main.py --auto --tool-host # Auto-detect local IP python main.py --auto --tool-host http://192.168.1.10:17616 # Explicit URL python main.py --auto # Local execution	2026-02-24 14:30:14 +01:00
sleepy	b5bd154ba6	feat: add --tool-port argument for tool server (default: 17616) - Tool server now runs on port 17616 by default (separate from main API on 17615) - Add --tool-port argument to customize tool server port - Update help text to reflect default port 17616 - Prevent port conflicts when running both services on same machine	2026-02-24 14:27:40 +01:00
sleepy	bad8732b7b	fix: simplify tool instructions to ~300 tokens (was 40k) Reduce tool instructions from 40k tokens to ~300 tokens: - List only 3 main tools (read, write, bash) with brief descriptions - Single concise JSON format example - Remove verbose formatting and multiple examples - Only add instructions on first request (no assistant response yet) This makes tool usage feasible for 8K-32K context models, especially important for home setups with limited VRAM.	2026-02-24 14:26:16 +01:00
sleepy	12eaac0d27	feat: implement distributed tool execution with tool host - Add ToolExecutor class supporting both local and remote tool execution - Add --tool-host argument to use remote tool execution server - Add --tool-server argument to run dedicated tool execution server - Add /v1/tools/execute endpoint for remote tool execution - Workers can execute tools on centralized tool host - Tools: read, write, bash with security restrictions Architecture: - Tool Host (--tool-server): Runs on one machine, executes all tools - Workers (--tool-host): Send tool requests to tool host, get results - Local mode (default): Execute tools locally as before	2026-02-24 14:24:22 +01:00
sleepy	1b181bf207	fix: remove tool instructions from prompt to prevent looping - Remove tool instructions from system prompt (they were confusing the 3B model) - Allow streaming even when tools are present - Model now responds normally, server parses and executes tools server-side - Fixes infinite loop where opencode would retry requests repeatedly Based on commit `d30eedaa63` which originally fixed this.	2026-02-24 14:15:21 +01:00
sleepy	c70f83ab6c	fix: simplify looping prevention - don't add instructions if assistant responded Instead of complex checks for tool_calls in various formats, simply check if any assistant message exists in the conversation. If the assistant has already responded, don't add tool instructions again. This prevents the conversation from growing with duplicate messages.	2026-02-24 14:10:13 +01:00
sleepy	df4587e796	fix: prevent looping by checking for server-side tool execution results Add check for 'Tool X result' pattern in assistant messages to detect when server-side tool execution has already occurred. This prevents the conversation from growing with duplicate user messages.	2026-02-24 14:06:49 +01:00
sleepy	00cd483aca	feat: add server-side tool execution for OpenAI-compatible providers Execute tools server-side instead of relying on client (opencode) to execute them. This works around known bugs with OpenAI-compatible providers and tool calling in opencode. Supported tools: read, write, bash, question, skill, todowrite, todoread	2026-02-24 14:01:02 +01:00
sleepy	0886b9ae73	fix: handle --model with full format (model:size:quant) - Parse model ID with format like qwen2.5-coder:7b:4bit - Return specific error if requested config not found or doesn't fit - Don't fall back to auto-selection when specific config requested	2026-02-24 13:15:59 +01:00
sleepy	27e1971276	debug: add response logging to both federation and local paths - Fix federation branch to properly return response instead of falling through - Add detailed debug output showing the full API response - Show finish_reason, tool_calls_count, and full JSON response	2026-02-24 13:12:33 +01:00
sleepy	e3eb52d285	debug: log message state when checking for tool calls	2026-02-24 13:08:58 +01:00
sleepy	fb8565cd73	feat: add command line example to startup summary Add a line showing the exact command to run the current configuration directly, using the model_id format (e.g., qwen:7b:4bit).	2026-02-24 13:02:18 +01:00
sleepy	75f5be59b0	fix: prevent tool instructions after tool calls in conversation Check if assistant has already generated tool_calls (either via tool_calls field or in content) and don't add instructions if so. This prevents the model from continuing to call tools after the first tool execution.	2026-02-24 12:58:25 +01:00
sleepy	c5b81960c3	fix: properly parse tool_calls with nested JSON in arguments field - Fix regex to properly extract function content with nested braces - Fix arguments extraction to handle escaped quotes correctly - Arguments are now properly unescaped and parsed as JSON - Tool calls now include correct arguments field for opencode to execute	2026-02-24 12:54:34 +01:00
sleepy	3728eb7020	chore: clean up debug logging Remove verbose debug prints that were added during development. Keep essential federation message but remove tool call debug output.	2026-02-24 12:46:49 +01:00
sleepy	550ad01ce6	fix: only add tool instructions on first request, not after tool results - Check if there are already tool results (role=tool) in messages - Only add tool instructions if no tool results present - Prevents model from generating more tool calls after tool execution - Model should now respond to tool results instead of calling more tools	2026-02-24 12:44:20 +01:00
sleepy	e3701cf21e	fix: properly extract content before tool_calls block - Search for the pattern in cleaned_text but extract position from original - Strip trailing markdown block markers from content - Ensure content is empty when tool_calls are present	2026-02-24 12:43:45 +01:00
sleepy	9d838c1bca	fix: handle markdown code blocks in tool calls and add federation debug	2026-02-24 12:28:04 +01:00
sleepy	175574042a	fix: disable streaming when tools are present When tools are provided and streaming is requested, fall through to non-streaming mode so tool calls can be properly parsed and returned. Remove verbose input debug logging.	2026-02-24 12:21:28 +01:00
sleepy	0da3923ba6	debug: print full response JSON in debug output	2026-02-24 12:18:41 +01:00
sleepy	90bab34ee4	debug: add response logging to both federation and local paths	2026-02-24 12:17:57 +01:00
sleepy	303962917d	debug: log request.tools to see if tools are being received	2026-02-24 12:14:21 +01:00
sleepy	13e6fb22ed	debug: add logging to tool call parsing	2026-02-24 12:08:56 +01:00
sleepy	76b12b3ba8	fix: improve tool_calls parsing for JavaScript-style output - Look for { tool_calls: [...] } pattern instead of just tool_calls: - Better handle unquoted keys AND unquoted string values - Add debug output when parsing fails - Fix regex to match the full JSON structure	2026-02-24 12:02:39 +01:00
sleepy	20e9e0afb3	fix: add tool parsing to federation path and debug logging - Add tool call parsing when using federation (was missing) - Add debug output showing which peers are being contacted - Fix variable shadowing in tool_calls parsing	2026-02-24 11:52:56 +01:00

1 2

99 Commits