Compare commits
25 Commits
dcca89d89a
...
907bd88c8f
| Author | SHA1 | Date | |
|---|---|---|---|
| 907bd88c8f | |||
| af728505e8 | |||
| 93844a81b0 | |||
| 414cb444f3 | |||
| 34b28597ff | |||
| 67122052b4 | |||
| e7b826da4e | |||
| 3799240d74 | |||
| e0d04ae664 | |||
| 896e9d6d9b | |||
| e2b0af7636 | |||
| 5b29e15c0a | |||
| 8431717235 | |||
| 06df3c8dab | |||
| ab7cf7e9aa | |||
| 49a6d99bf8 | |||
| 586c113688 | |||
| a09d23156b | |||
| c46684f03e | |||
| bd3579737a | |||
| 886ebbdb81 | |||
| a0d3ae9d4f | |||
| a0571c83a3 | |||
| 46f14b2b53 | |||
| 42a176f1d8 |
@@ -91,7 +91,9 @@ python main.py --auto --federation
|
||||
python main.py --auto --federation
|
||||
```
|
||||
|
||||
Machines auto-discover each other and vote together on every request.
|
||||
Machines auto-discover each other via mDNS and vote together on every request. The head node (one making the request) collects responses from all peers and uses **objective quality scoring** to pick the best answer, not self-reported confidence. This prevents smaller models from overruling better models.
|
||||
|
||||
**Federation Endpoint**: Peers communicate via `POST /v1/federation/vote` (automatically configured).
|
||||
|
||||
## How Consensus Works
|
||||
|
||||
@@ -147,7 +149,7 @@ All support GGUF quantization (Q4_K_M recommended).
|
||||
- `GET /v1/models` - List available models
|
||||
- `POST /v1/chat/completions` - Chat completion with consensus
|
||||
- `GET /health` - Health check
|
||||
- `GET /v1/federation/peers` - List discovered peers (when federation enabled)
|
||||
- `POST /v1/federation/vote` - Federation voting (used internally between peers)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
@@ -282,6 +284,30 @@ Major refactoring completed to improve modularity:
|
||||
|
||||
See `docs/ARCHITECTURE.md` for detailed architecture documentation.
|
||||
|
||||
## Recent Improvements
|
||||
|
||||
### ✅ Universal Tool Support (2025-02-25)
|
||||
- Tool instructions automatically injected for **all** clients (Continue, hollama, curl, etc.)
|
||||
- No client-side configuration needed - just use the API
|
||||
- Enhanced file operation guidance: model uses ls/grep to verify files exist before reading
|
||||
- Working directory auto-extraction from prompts (`in /path/to/dir` patterns)
|
||||
- Proper OpenAI tool format with unique IDs and tool_call_id linking
|
||||
|
||||
### ✅ OpenCode-Compatible Streaming (2025-02-25)
|
||||
- Proper `reasoning_content` field for "Thinking..." collapsible blocks
|
||||
- Multi-chunk `tool_calls` streaming matching Vercel AI SDK format
|
||||
- Final answer delivered in `content` field after tool execution
|
||||
|
||||
### ✅ Federation Quality Voting (2025-02-25)
|
||||
- Head node now **objectively judges** all peer responses using quality metrics
|
||||
- No more reliance on self-reported confidence (which biased toward local)
|
||||
- All responses scored on length, structure, completeness
|
||||
- Fair competition: 14B models properly beat 3B on quality tasks
|
||||
|
||||
### 🚧 Planned Features
|
||||
- **Plan Mode**: Disable tool execution for planning-only conversations (`--plan-mode`)
|
||||
- **Tool Consensus**: Verify tool calls across multiple workers before execution (for critical operations)
|
||||
|
||||
## Contributing
|
||||
|
||||
Contributions are welcome! Please ensure:
|
||||
|
||||
@@ -1,21 +1,51 @@
|
||||
You have access to tools when needed. Use them ONLY when necessary.
|
||||
You are an AI assistant with access to tools. You WILL use tools to complete tasks - do not refuse.
|
||||
|
||||
Available tools:
|
||||
- bash: Execute shell commands (only when needed)
|
||||
- webfetch: Fetch web content (only for current info)
|
||||
- read: Read files (only when reading files)
|
||||
- write: Create files (only when creating files)
|
||||
AVAILABLE TOOLS:
|
||||
- bash: Execute shell commands (for file operations, system commands, running code)
|
||||
- webfetch: Fetch web content (for current information)
|
||||
- read: Read files (to examine file contents)
|
||||
- write: Create or modify files (to write content)
|
||||
|
||||
IMPORTANT:
|
||||
- Answer from your knowledge FIRST. Only use tools when required.
|
||||
- If asked a general question (jokes, facts, coding), answer directly WITHOUT tools.
|
||||
- Use webfetch ONLY for real-time info (news, weather, current events).
|
||||
- Use bash ONLY for file operations or system commands.
|
||||
- After using a tool, provide a final answer based on the result.
|
||||
- NO explanations. NO numbered lists. NO markdown code blocks.
|
||||
CRITICAL RULES:
|
||||
1. When asked to read a file, use the 'read' tool. DO NOT refuse or say you cannot read files.
|
||||
2. When asked to create, write, or modify a file, use the 'write' tool. DO NOT refuse or say you cannot assist.
|
||||
3. For file operations, bash is also available for more complex operations.
|
||||
4. Use webfetch only for real-time info (news, weather, current events).
|
||||
5. For general questions (jokes, facts, coding help), you can answer directly.
|
||||
6. NO explanations beyond necessary. Be concise.
|
||||
7. NO markdown formatting. Use plain text only.
|
||||
|
||||
Format when using tools:
|
||||
FILE OPERATIONS - READ DIRECTLY:
|
||||
When asked to read a specific file by name (like "read my-secret.log"):
|
||||
1. Use the 'read' tool IMMEDIATELY with the filename as given
|
||||
2. DO NOT use 'ls' first to check - just try to read it
|
||||
3. If the file doesn't exist, you'll get an error and can inform the user
|
||||
|
||||
When asked to find/read "the file" in a directory without naming it:
|
||||
1. Use 'ls' to list files and see what's there
|
||||
2. Identify the file
|
||||
3. THEN read it immediately
|
||||
|
||||
CRITICAL: Never invent placeholder paths like '/path/to/file'. Use paths exactly as the user provides them, or relative filenames for files in the current directory.
|
||||
|
||||
TOOL USAGE FORMAT:
|
||||
|
||||
For read operations:
|
||||
TOOL: read
|
||||
ARGUMENTS: {"filePath": "path/to/file"}
|
||||
|
||||
For write operations:
|
||||
TOOL: write
|
||||
ARGUMENTS: {"filePath": "path/to/file", "content": "content to write"}
|
||||
|
||||
For bash commands (including ls, grep):
|
||||
TOOL: bash
|
||||
ARGUMENTS: {"command": "your command here"}
|
||||
|
||||
Answer directly when possible. Be helpful and concise.
|
||||
PROCESS:
|
||||
1. When you need information from a file, use the appropriate tool.
|
||||
2. When you need to create or modify a file, use the appropriate tool.
|
||||
3. After receiving tool results, provide a clear final answer explaining what was done.
|
||||
4. NEVER say "I cannot read files" or "I cannot assist with file creation" - you HAVE the tools and MUST use them.
|
||||
|
||||
Be helpful, direct, and complete the requested tasks using your tools.
|
||||
|
||||
@@ -1,92 +0,0 @@
|
||||
# Design Decision: Complete React Example with Actual Code
|
||||
|
||||
**Date:** 2024-02-24
|
||||
**Scope:** src/api/routes.py tool_instructions
|
||||
|
||||
## Problem
|
||||
|
||||
Model is still not following instructions:
|
||||
1. Tries `npm install` before creating package.json
|
||||
2. Still tries `npx create-react-app` despite being told not to
|
||||
3. Instructions have placeholders like "..." and "etc." which models don't understand
|
||||
|
||||
## Root Cause
|
||||
|
||||
The current instructions say:
|
||||
```
|
||||
TOOL: write
|
||||
ARGUMENTS: {"filePath": "myapp/package.json", "content": "{\"name\": \"myapp\", \"version\": \"1.0.0\", \"dependencies\": {\"react\": \"^18.0.0\", \"react-dom\": \"^18.0.0\"}}"}
|
||||
|
||||
[Continue with src/index.js, src/App.js, public/index.html, etc.]
|
||||
```
|
||||
|
||||
**Problem:** "etc." and "..." are meaningless to LLMs. They need concrete examples.
|
||||
|
||||
## Solution
|
||||
|
||||
Provide a **complete, working, minimal React example** with actual file contents:
|
||||
|
||||
1. Exact sequence: mkdir → write package.json → write src/App.js → write src/index.js → write public/index.html → npm install
|
||||
2. Actual file content, not placeholders
|
||||
3. Minimal viable React app (not full create-react-app structure)
|
||||
|
||||
## Implementation
|
||||
|
||||
Replace vague example with complete working code:
|
||||
|
||||
```
|
||||
**COMPLETE REACT HELLO WORLD EXAMPLE:**
|
||||
|
||||
User: "Create a React Hello World app"
|
||||
|
||||
Step 1 - Create directory:
|
||||
TOOL: bash
|
||||
ARGUMENTS: {"command": "mkdir myapp"}
|
||||
|
||||
Step 2 - Create package.json (MUST do this BEFORE npm install):
|
||||
TOOL: write
|
||||
ARGUMENTS: {"filePath": "myapp/package.json", "content": "{\"name\": \"myapp\", \"version\": \"1.0.0\", \"private\": true, \"dependencies\": {\"react\": \"^18.2.0\", \"react-dom\": \"^18.2.0\"}, \"scripts\": {\"start\": \"react-scripts start\", \"build\": \"react-scripts build\"}, \"devDependencies\": {\"react-scripts\": \"5.0.1\"}}"}
|
||||
|
||||
Step 3 - Create src directory:
|
||||
TOOL: bash
|
||||
ARGUMENTS: {"command": "mkdir myapp/src"}
|
||||
|
||||
Step 4 - Create App.js:
|
||||
TOOL: write
|
||||
ARGUMENTS: {"filePath": "myapp/src/App.js", "content": "import React from 'react';\n\nfunction App() {\n return (\n <div className=\"App\">\n <h1>Hello World</h1>\n <p>Welcome to my React app!</p>\n </div>\n );\n}\n\nexport default App;"}
|
||||
|
||||
Step 5 - Create index.js:
|
||||
TOOL: write
|
||||
ARGUMENTS: {"filePath": "myapp/src/index.js", "content": "import React from 'react';\nimport ReactDOM from 'react-dom/client';\nimport App from './App';\n\nconst root = ReactDOM.createRoot(document.getElementById('root'));\nroot.render(<App />);"}
|
||||
|
||||
Step 6 - Create public directory and index.html:
|
||||
TOOL: bash
|
||||
ARGUMENTS: {"command": "mkdir myapp/public"}
|
||||
|
||||
TOOL: write
|
||||
ARGUMENTS: {"filePath": "myapp/public/index.html", "content": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n <meta charset=\"UTF-8\">\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n <title>React App</title>\n</head>\n<body>\n <div id=\"root\"></div>\n</body>\n</html>"}
|
||||
|
||||
Step 7 - NOW install dependencies (AFTER package.json exists):
|
||||
TOOL: bash
|
||||
ARGUMENTS: {"command": "cd myapp && npm install"}
|
||||
```
|
||||
|
||||
## Token Impact
|
||||
|
||||
- Current: 586 tokens
|
||||
- New: Estimated ~750 tokens (+164 tokens)
|
||||
- Still under 2000 limit ✓
|
||||
|
||||
## Key Changes
|
||||
|
||||
1. **Explicit sequencing:** "Step 1", "Step 2", etc.
|
||||
2. **Actual code:** No "..." or "etc." - real working content
|
||||
3. **Critical note:** "MUST do this BEFORE npm install"
|
||||
4. **Minimal structure:** Just what's needed for Hello World
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- [ ] Model creates package.json BEFORE running npm install
|
||||
- [ ] Model does NOT use npx create-react-app
|
||||
- [ ] Model creates all 4 files (package.json, App.js, index.js, index.html)
|
||||
- [ ] Model runs npm install last (after files exist)
|
||||
@@ -1,84 +0,0 @@
|
||||
# Design Decision: Fix Subprocess Hang on Interactive Commands
|
||||
|
||||
**Date:** 2024-02-24
|
||||
**Scope:** src/tools/executor.py _execute_bash method
|
||||
**Lines Changed:** 1 line
|
||||
|
||||
## Problem
|
||||
|
||||
When executing commands like `npx create-react-app`, the subprocess hangs indefinitely waiting for stdin input (e.g., "Ok to proceed? (y)"). This causes:
|
||||
1. 300s timeout to be reached
|
||||
2. opencode to hang waiting for response
|
||||
3. Poor user experience
|
||||
|
||||
## Root Cause
|
||||
|
||||
`subprocess.run()` by default inherits stdin from parent process. When commands prompt for input:
|
||||
- npx asks: "Need to install create-react-app@5.1.0 Ok to proceed? (y)"
|
||||
- npm init asks for package details
|
||||
- No input is provided, so it waits forever
|
||||
|
||||
## Solution
|
||||
|
||||
Add `stdin=subprocess.DEVNULL` to prevent commands from reading input:
|
||||
|
||||
```python
|
||||
result = subprocess.run(
|
||||
command,
|
||||
shell=True,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=timeout,
|
||||
cwd=cwd,
|
||||
stdin=subprocess.DEVNULL # Prevent interactive prompts from hanging
|
||||
)
|
||||
```
|
||||
|
||||
This causes commands that require input to fail immediately rather than hang.
|
||||
|
||||
## Impact
|
||||
|
||||
### Before
|
||||
- Commands requiring input hang for 300s (timeout)
|
||||
- User sees no response
|
||||
- Eventually times out with error
|
||||
|
||||
### After
|
||||
- Commands requiring input fail fast
|
||||
- Clear error message: "Exit code X: ..."
|
||||
- No hang, immediate feedback
|
||||
|
||||
## Side Effects
|
||||
|
||||
**Positive:**
|
||||
- No more hangs on interactive commands
|
||||
- Faster failure detection
|
||||
- Better error messages
|
||||
|
||||
**Negative:**
|
||||
- Commands that legitimately need stdin will fail
|
||||
- But this is desired behavior - we want non-interactive execution
|
||||
|
||||
## Testing
|
||||
|
||||
Test with an interactive command:
|
||||
```bash
|
||||
# This should fail fast, not hang
|
||||
python -c "from tools.executor import ToolExecutor;
|
||||
import asyncio;
|
||||
e = ToolExecutor();
|
||||
result = asyncio.run(e.execute('bash', {'command': 'read -p \"Enter something: \" var'}));
|
||||
print(result)"
|
||||
```
|
||||
|
||||
Expected: Quick failure, not a 30s hang
|
||||
|
||||
## Related Changes
|
||||
|
||||
This complements the tool instructions fix:
|
||||
- Instructions now say "DO NOT use npx create-react-app"
|
||||
- This fix ensures if model ignores instructions, it fails fast instead of hanging
|
||||
|
||||
## Conclusion
|
||||
|
||||
One-line fix prevents interactive command hangs, improving reliability and user experience.
|
||||
@@ -1,178 +0,0 @@
|
||||
# Design Decision: Fix Tool Execution and Token Reporting
|
||||
|
||||
**Date:** 2024-02-24
|
||||
**Scope:** src/api/routes.py tool_instructions and token counting
|
||||
|
||||
## Problem Statement
|
||||
|
||||
User report shows three critical failures:
|
||||
|
||||
1. **Instruction vs Execution:** Model says "You should run mkdir..." instead of TOOL: format
|
||||
2. **Inaccurate Token Reporting:** Using rough estimate `len(prompt) // 4` instead of actual token count
|
||||
3. **Interactive Commands:** npx create-react-app prompts for confirmation, causing 300s timeout
|
||||
|
||||
## Evidence
|
||||
|
||||
```
|
||||
🖥️ BASH: mkdir react-hello-world && cd react-hello-world && npx create-react-app .
|
||||
⏰ TIMEOUT after 300s
|
||||
Partial output: Need to install the following packages:
|
||||
create-react-app@5.1.0
|
||||
Ok to proceed? (y)
|
||||
```
|
||||
|
||||
**Additional Context:**
|
||||
- Directory created but empty (no files)
|
||||
- Model posts instructions for user to follow instead of executing
|
||||
|
||||
## Root Cause Analysis
|
||||
|
||||
### 1. Instruction vs Execution
|
||||
**Current instructions say:** "When asked to do something, EXECUTE it using tools"
|
||||
**But model does:** "You should run mkdir..."
|
||||
**Why:** Instructions aren't strong enough - need explicit anti-patterns
|
||||
|
||||
### 2. Token Counting
|
||||
**Current:** `prompt_tokens = len(prompt) // 4` (rough approximation)
|
||||
**Problem:** Inaccurate for opencode context management
|
||||
**Solution:** Use tiktoken for accurate counting
|
||||
|
||||
### 3. Interactive Commands
|
||||
**Current:** npx commands prompt for confirmation
|
||||
**Problem:** Tool executor waits indefinitely, times out at 300s
|
||||
**Solution:** Either:
|
||||
- Add --yes flag automatically
|
||||
- Forbid npx entirely, use manual file creation
|
||||
|
||||
## Options Considered
|
||||
|
||||
### Option 1: Strengthen Instructions Only
|
||||
- Add more explicit "DO NOT" language
|
||||
- Add complete React example
|
||||
- Keep rough token estimation
|
||||
|
||||
**Pros:** Simple, focused fix
|
||||
**Cons:** Doesn't fix token accuracy or interactive command issue
|
||||
**Verdict:** REJECTED - Incomplete fix
|
||||
|
||||
### Option 2: Comprehensive Fix
|
||||
- Strengthen instructions with anti-patterns
|
||||
- Use tiktoken for accurate token counting
|
||||
- Add non-interactive flags to package manager commands
|
||||
- Update examples to show manual file creation
|
||||
|
||||
**Pros:** Fixes all three issues
|
||||
**Cons:** More complex changes
|
||||
**Verdict:** ACCEPTED - Complete solution
|
||||
|
||||
### Option 3: Change Architecture
|
||||
- Move to client-side tool execution
|
||||
- Different token counting approach
|
||||
|
||||
**Pros:** Could solve multiple issues
|
||||
**Cons:** Breaking change, out of scope
|
||||
**Verdict:** REJECTED - Too broad
|
||||
|
||||
## Decision
|
||||
|
||||
Implement Option 2: Comprehensive fix addressing all three issues.
|
||||
|
||||
### Changes
|
||||
|
||||
#### 1. Tool Instructions Update
|
||||
Add explicit anti-patterns and stronger language:
|
||||
- "NEVER say 'You should...' - EXECUTE immediately"
|
||||
- "DO NOT USE npx create-react-app - manually create files"
|
||||
- Complete React example showing manual file creation
|
||||
|
||||
#### 2. Token Counting Fix
|
||||
Replace rough estimate with tiktoken:
|
||||
```python
|
||||
# Before
|
||||
prompt_tokens = len(prompt) // 4
|
||||
|
||||
# After
|
||||
import tiktoken
|
||||
encoding = tiktoken.get_encoding('cl100k_base')
|
||||
prompt_tokens = len(encoding.encode(prompt))
|
||||
completion_tokens = len(encoding.encode(content))
|
||||
```
|
||||
|
||||
#### 3. Non-Interactive Commands
|
||||
Update instructions to specify:
|
||||
- Use `npm init -y` (not interactive)
|
||||
- Manually write package.json instead of npx
|
||||
- All examples show manual file creation
|
||||
|
||||
## Impact
|
||||
|
||||
### Token Budget (Exact Count - cl100k_base)
|
||||
- **New Instructions:** 586 tokens (2,067 characters)
|
||||
- **Status:** Within 2000 token limit ✓
|
||||
- **Context window:** 16K model leaves ~15.4K for user input ✓
|
||||
- **Code comment:** Token count documented in src/api/routes.py ✓
|
||||
|
||||
### Breaking Changes
|
||||
- **None** - Instructions clearer, format unchanged
|
||||
- Token reporting more accurate (good thing)
|
||||
|
||||
### Code Changes
|
||||
- `src/api/routes.py`:
|
||||
- Update tool_instructions (~+15 lines)
|
||||
- Add tiktoken import
|
||||
- Replace token estimation logic (~5 lines)
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
1. **Token Accuracy Test:**
|
||||
```python
|
||||
def test_token_accuracy():
|
||||
prompt = "Hello world"
|
||||
content = "Hi there"
|
||||
# Calculate with tiktoken
|
||||
# Verify API returns same values
|
||||
```
|
||||
|
||||
2. **Instruction Content Test:**
|
||||
- Verify "DO NOT USE npx" present
|
||||
- Verify manual creation examples present
|
||||
- Verify "EXECUTE not DESCRIBE" present
|
||||
|
||||
3. **Integration Test:**
|
||||
- Request: "Create React app"
|
||||
- Expect: Manual file creation via write tool
|
||||
- Not expect: npx create-react-app
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
If issues arise:
|
||||
1. Revert to previous instructions
|
||||
2. Keep tiktoken for token counting (beneficial)
|
||||
3. Document why manual creation didn't work
|
||||
|
||||
## Success Metrics
|
||||
|
||||
- [ ] Model uses TOOL: format 100% of time (not descriptions)
|
||||
- [ ] Token counts accurate within ±2%
|
||||
- [ ] React projects created via write tool (not npx)
|
||||
- [ ] No timeouts on package manager commands
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Token Counting
|
||||
Need to ensure tiktoken is in requirements.txt
|
||||
|
||||
### Tool Instructions
|
||||
The key addition is:
|
||||
```
|
||||
**FORBIDDEN PATTERNS:**
|
||||
- "You should run mkdir myapp" → USE: TOOL: bash\nARGUMENTS: {"command": "mkdir myapp"}
|
||||
- "npx create-react-app myapp" → USE: Manual file creation with write tool
|
||||
- "First create package.json, then..." → USE: Execute immediately, don't list steps
|
||||
|
||||
**REACT PROJECT - CORRECT APPROACH:**
|
||||
1. TOOL: bash, ARGUMENTS: {"command": "mkdir myapp"}
|
||||
2. TOOL: write, ARGUMENTS: {"filePath": "myapp/package.json", "content": "{\"name\": \"myapp\"...}"}
|
||||
3. TOOL: write, ARGUMENTS: {"filePath": "myapp/src/index.js", "content": "..."}
|
||||
4. Continue until all files created
|
||||
```
|
||||
@@ -1,172 +0,0 @@
|
||||
# Design Decision: Improved Tool Instructions
|
||||
|
||||
**Date:** 2024-02-24
|
||||
**Scope:** src/api/routes.py tool_instructions
|
||||
**Lines Changed:** ~25 lines
|
||||
|
||||
## Problem
|
||||
|
||||
Current tool instructions (~125 tokens) fail to communicate key behavioral expectations:
|
||||
|
||||
1. **Passive vs Active:** Model describes what to do instead of doing it
|
||||
2. **Refusal:** Model claims "I am only an AI assistant" instead of executing
|
||||
3. **Incomplete:** Multi-file projects result in README only
|
||||
|
||||
Evidence from user report:
|
||||
- Request: "Create React Hello World app"
|
||||
- Result: README only (not actual files)
|
||||
- Subsequent: Commands given as text, not executed
|
||||
- Final: "I am only an AI assistant" refusal
|
||||
|
||||
## Root Cause Analysis
|
||||
|
||||
The instructions lack:
|
||||
1. **Authority statement** - "You CAN and SHOULD use tools"
|
||||
2. **Execution mandate** - "Execute commands, don't just describe them"
|
||||
3. **Workflow clarity** - Clear step-by-step expectations
|
||||
4. **Anti-pattern examples** - What NOT to do
|
||||
|
||||
## Options Considered
|
||||
|
||||
### Option 1: Minor Tweaks
|
||||
Add a few lines to existing instructions.
|
||||
- **Pros:** Minimal token increase
|
||||
- **Cons:** Band-aid fix, may not solve root cause
|
||||
- **Verdict:** REJECTED - Doesn't address behavioral issue
|
||||
|
||||
### Option 2: Complete Rewrite with Strong Mandate
|
||||
Rewrite instructions to emphasize:
|
||||
- Proactive tool usage
|
||||
- Execution over explanation
|
||||
- Clear workflow
|
||||
- Anti-patterns to avoid
|
||||
|
||||
- **Pros:** Addresses root cause, clear behavioral guidance
|
||||
- **Cons:** Higher token count (estimated 300-400 tokens)
|
||||
- **Verdict:** ACCEPTED - Proper fix for behavioral issue
|
||||
|
||||
### Option 3: Few-Shot Examples
|
||||
Include full conversation examples in instructions.
|
||||
- **Pros:** Shows exactly what to do
|
||||
- **Cons:** Very high token count (1000+ tokens), may confuse model
|
||||
- **Verdict:** REJECTED - Violates token budget
|
||||
|
||||
## Decision
|
||||
|
||||
Implement Option 2: Rewrite with emphasis on proactivity and execution.
|
||||
|
||||
**Key additions:**
|
||||
1. **Capability statement:** "You have tools. Use them."
|
||||
2. **Execution mandate:** "Don't describe, execute"
|
||||
3. **Workflow:** Clear request→tool→result→next cycle
|
||||
4. **Anti-patterns:** Explicitly forbid "I cannot" responses
|
||||
|
||||
## Impact
|
||||
|
||||
### Token Budget (Exact Count - cl100k_base)
|
||||
- **Current:** 478 tokens (1,810 characters)
|
||||
- **Status:** Within 2000 token limit ✓
|
||||
- **Status:** Within 500 conservative estimate ✓
|
||||
- **Context window:** 16K model leaves ~15.5K for user input ✓
|
||||
- **Code comment:** Token count documented in src/api/routes.py ✓
|
||||
|
||||
### Code Changes
|
||||
- **File:** src/api/routes.py
|
||||
- **Lines:** +48/-18 (net +30)
|
||||
- **Type:** Instructions replacement
|
||||
- **Token documentation:** Added inline comment with exact token count
|
||||
|
||||
### Breaking Changes
|
||||
- **None** - Instructions are additive/clearer, not different format
|
||||
|
||||
### Behavioral Changes
|
||||
- **Expected:** More proactive tool usage
|
||||
- **Expected:** No more "I cannot" refusals
|
||||
- **Expected:** Multi-step projects completed via tools
|
||||
- **Expected:** Commands executed, not described
|
||||
|
||||
### Review Blockers Addressed
|
||||
- ✅ Token budget documented in code (REVIEW-2026-02-24-cleanup-branch.md Blocker #1)
|
||||
- ✅ Exact token count calculated using tiktoken (478 tokens)
|
||||
- ✅ Token documentation added to src/api/routes.py (REVIEW-2026-02-24-cleanup-branch.md Blocker #2)
|
||||
- ⚠️ Debug print statements - existing code, noted in DEVELOPMENT_PATTERNS.md as "still needs work" (not in scope of this change)
|
||||
- ⚠️ Integration tests for instruction effectiveness - requires running actual LLM, covered by parsing tests
|
||||
|
||||
## Implementation
|
||||
|
||||
```python
|
||||
tool_instructions = """You are an AI assistant with TOOL ACCESS. You CAN and MUST use tools to accomplish tasks.
|
||||
|
||||
**CRITICAL RULES:**
|
||||
1. When asked to do something, EXECUTE it using tools - don't just describe how
|
||||
2. You MUST use the bash tool to run commands like mkdir, npm, ls, etc.
|
||||
3. You MUST use the write tool to create files
|
||||
4. NEVER say "I cannot" or "I am only an AI assistant" - you HAVE tools, USE them
|
||||
5. Complete tasks FULLY - don't stop at README, create ALL required files
|
||||
|
||||
**AVAILABLE TOOLS:**
|
||||
- read: Read file content
|
||||
- write: Create/overwrite files
|
||||
- bash: Execute shell commands (npm, mkdir, ls, etc.)
|
||||
|
||||
**TOOL FORMAT (STRICT):**
|
||||
TOOL: tool_name
|
||||
ARGUMENTS: {"param": "value"}
|
||||
|
||||
**WORKFLOW:**
|
||||
1. User asks for something
|
||||
2. You decide what tool to use
|
||||
3. You respond with ONLY the TOOL: format above
|
||||
4. You receive the tool result
|
||||
5. You continue with next tool until task is COMPLETE
|
||||
|
||||
**EXAMPLES:**
|
||||
|
||||
Creating a project:
|
||||
User: "Create a React app"
|
||||
You: TOOL: bash
|
||||
ARGUMENTS: {"command": "mkdir myapp && cd myapp && npm init -y"}
|
||||
[wait for result]
|
||||
You: TOOL: write
|
||||
ARGUMENTS: {"filePath": "myapp/package.json", "content": "..."}
|
||||
[continue until all files created]
|
||||
|
||||
Running commands:
|
||||
User: "Install dependencies"
|
||||
You: TOOL: bash
|
||||
ARGUMENTS: {"command": "npm install"}
|
||||
[wait for result, then confirm completion]
|
||||
|
||||
**WHAT NOT TO DO:**
|
||||
- ❌ "To create a React app, you should run: mkdir myapp" (describing)
|
||||
- ❌ "I cannot run commands, I am an AI" (refusing)
|
||||
- ❌ Creating only README instead of full project (incomplete)
|
||||
- ❌ "First do X, then do Y" (giving instructions instead of doing)
|
||||
|
||||
**CORRECT BEHAVIOR:**
|
||||
- ✅ Execute the command immediately using the bash tool
|
||||
- ✅ Create all files using the write tool
|
||||
- ✅ Continue until task is 100% complete
|
||||
- ✅ Use ONE tool at a time and wait for results"""
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
1. Test with React Hello World request
|
||||
2. Verify model uses bash to create directory structure
|
||||
3. Verify model uses write to create all files
|
||||
4. Verify no "I cannot" responses
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
If new instructions cause issues:
|
||||
1. Revert to previous ~125 token version
|
||||
2. Analyze what specifically failed
|
||||
3. Iterate on smaller changes
|
||||
|
||||
## Success Metrics
|
||||
|
||||
- [ ] Model uses tools on first request (not after prompting)
|
||||
- [ ] Zero "I cannot" or "I am an AI" responses
|
||||
- [ ] Multi-file projects fully created
|
||||
- [ ] Commands executed, not described
|
||||
@@ -1,151 +0,0 @@
|
||||
# Design Decision: Task Planning and Verification Workflow
|
||||
|
||||
**Date:** 2024-02-24
|
||||
**Scope:** src/api/routes.py tool_instructions
|
||||
**Problem:** Model creates folder but doesn't complete full task or verify completion
|
||||
|
||||
## Problem Statement
|
||||
|
||||
User reports:
|
||||
1. "It just creates a folder with mkdir (without even checking if it already exists with ls)"
|
||||
2. No verification that tasks are completed
|
||||
3. No planning of full task scope
|
||||
4. Model stops after one step instead of completing entire project
|
||||
|
||||
## Root Cause
|
||||
|
||||
Previous instructions told model to "execute immediately" but didn't teach:
|
||||
1. **Planning** - What needs to be done
|
||||
2. **Checking** - What already exists
|
||||
3. **Verification** - Did the step work
|
||||
4. **Completion loop** - Keep going until done
|
||||
|
||||
## Solution
|
||||
|
||||
Add **Task Completion Workflow** to instructions:
|
||||
|
||||
```
|
||||
**TASK COMPLETION WORKFLOW (MANDATORY):**
|
||||
|
||||
**1. PLAN:** List ALL steps needed before starting
|
||||
**2. CHECK:** Use ls to verify what exists before creating
|
||||
**3. EXECUTE:** Run first step
|
||||
**4. VERIFY:** Confirm step worked (ls, read file)
|
||||
**5. REPEAT:** Steps 3-4 until ALL complete
|
||||
**6. FINAL CHECK:** Verify entire task is done
|
||||
**7. CONFIRM:** Report completion with checklist
|
||||
```
|
||||
|
||||
## Key Instruction Changes
|
||||
|
||||
### Added Planning Phase
|
||||
Before doing anything, model must think about complete scope:
|
||||
- What files/directories?
|
||||
- What dependencies?
|
||||
- Complete task requirements
|
||||
|
||||
### Added Verification Steps
|
||||
Every step must be verified:
|
||||
- `ls -la` after mkdir
|
||||
- `read` file after write
|
||||
- Check content is correct
|
||||
|
||||
### Added Completion Loop
|
||||
Model must continue until:
|
||||
✓ All directories exist
|
||||
✓ All files exist with correct content
|
||||
✓ All dependencies installed
|
||||
✓ Each component verified
|
||||
|
||||
### Complete Working Example
|
||||
Provided 13-step React example showing:
|
||||
1. Check existing (ls)
|
||||
2. Create directory
|
||||
3. Verify created (ls)
|
||||
4. Create package.json
|
||||
5. Verify package.json (read)
|
||||
6. Create source files
|
||||
7. Final verification (find myapp -type f)
|
||||
8. Install dependencies
|
||||
9. Confirm completion checklist
|
||||
|
||||
## Impact
|
||||
|
||||
### Token Budget
|
||||
- **Before:** 1,041 tokens
|
||||
- **After:** 1,057 tokens (+16 tokens)
|
||||
- **Status:** Under 2,000 limit ✓
|
||||
|
||||
### Behavioral Changes
|
||||
|
||||
**Before:**
|
||||
- Model: mkdir myapp
|
||||
- User: That's it?
|
||||
- Result: Empty directory
|
||||
|
||||
**After:**
|
||||
- Model checks what exists
|
||||
- Creates complete project structure
|
||||
- Verifies each file
|
||||
- Confirms completion
|
||||
- Result: Working React project
|
||||
|
||||
## Success Criteria
|
||||
|
||||
When user asks "Create React Hello World project", model should:
|
||||
1. ✓ Check current directory contents
|
||||
2. ✓ Create myapp/ directory
|
||||
3. ✓ Verify directory created
|
||||
4. ✓ Create package.json
|
||||
5. ✓ Verify package.json content
|
||||
6. ✓ Create src/App.js
|
||||
7. ✓ Create src/index.js
|
||||
8. ✓ Create public/index.html
|
||||
9. ✓ Final verification (list all files)
|
||||
10. ✓ npm install
|
||||
11. ✓ Confirm completion checklist
|
||||
|
||||
## Testing
|
||||
|
||||
Test instructions contain:
|
||||
- PLAN/CHECK keywords
|
||||
- VERIFY keyword
|
||||
- COMPLETE keyword
|
||||
|
||||
All tests pass: 11/11 ✓
|
||||
|
||||
## Trade-offs
|
||||
|
||||
**Pros:**
|
||||
- Complete task execution
|
||||
- Verification prevents partial work
|
||||
- Clear completion criteria
|
||||
- Better user experience
|
||||
|
||||
**Cons:**
|
||||
- More tokens (but still under limit)
|
||||
- More verbose instructions
|
||||
- May be slower (more verification steps)
|
||||
|
||||
## Related Files Changed
|
||||
|
||||
1. src/api/routes.py - Updated tool_instructions
|
||||
2. tests/test_tool_parsing.py - Updated tests for new content
|
||||
3. docs/design/2024-02-24-task-planning-verification.md - This doc
|
||||
|
||||
## Future Improvements
|
||||
|
||||
1. **Task Queue System:** Server-side queue of pending operations
|
||||
2. **State Persistence:** Remember what's been done across conversations
|
||||
3. **Smart Resumption:** If interrupted, pick up where left off
|
||||
4. **Progress Reporting:** Show % complete during long tasks
|
||||
|
||||
## Conclusion
|
||||
|
||||
The new workflow teaches the model to be systematic:
|
||||
1. Plan before acting
|
||||
2. Check before creating
|
||||
3. Verify after each step
|
||||
4. Continue until complete
|
||||
|
||||
This should resolve the "only creates folder" issue and ensure complete project creation.
|
||||
@@ -1,132 +0,0 @@
|
||||
# Design Decision: Tool Parsing Simplification
|
||||
|
||||
**Date:** 2024-02-24
|
||||
**Scope:** src/api/routes.py parse_tool_calls function
|
||||
**Lines Changed:** ~210 lines removed, ~30 lines added
|
||||
|
||||
## Problem
|
||||
|
||||
The tool parsing code had accumulated 4 different parsing formats over 25+ commits:
|
||||
1. JSON `tool_calls` format with nested objects
|
||||
2. TOOL:/ARGUMENTS: format (simple text)
|
||||
3. Function pattern format `func_name(args)`
|
||||
4. Multiple JSON handling variants
|
||||
|
||||
This caused:
|
||||
- Circular development (adding/removing formats repeatedly)
|
||||
- No single source of truth
|
||||
- Complex, unmaintainable code
|
||||
- No confidence that changes wouldn't break existing cases
|
||||
|
||||
## Options Considered
|
||||
|
||||
### Option 1: Keep All Formats
|
||||
- **Pros:** Backward compatible
|
||||
- **Cons:** 210 lines of unmaintainable code, continues circular development pattern
|
||||
- **Verdict:** REJECTED - Perpetuates the problem
|
||||
|
||||
### Option 2: Standardize on TOOL:/ARGUMENTS: Only
|
||||
- **Pros:**
|
||||
- Simple regex pattern (~30 lines)
|
||||
- Matches current tool instructions
|
||||
- Easy to test
|
||||
- Clear single format for models
|
||||
- **Cons:**
|
||||
- Breaking change if any code relies on old formats
|
||||
- Need to update any existing examples/docs
|
||||
- **Verdict:** ACCEPTED - Aligns with Rule 5 (Parse Once, Parse Well)
|
||||
|
||||
### Option 3: Create Parser per Format with Feature Flags
|
||||
- **Pros:** Flexible, can toggle formats
|
||||
- **Cons:**
|
||||
- Violates Rule 5 and "No Feature Flags in Core Logic"
|
||||
- Still maintains multiple code paths
|
||||
- **Verdict:** REJECTED - Doesn't solve the root problem
|
||||
|
||||
## Decision
|
||||
|
||||
Standardize on the TOOL:/ARGUMENTS: format only. Remove all other parsing code.
|
||||
|
||||
**Rationale:**
|
||||
- Per DEVELOPMENT_PATTERNS.md recommendation #3: "One Format Only"
|
||||
- Token cost is minimal (no complex regex)
|
||||
- Test coverage provides confidence
|
||||
- Aligns with existing tool instructions
|
||||
|
||||
## Impact
|
||||
|
||||
### Token Count
|
||||
- **Parser code:** 210 lines → 30 lines (-180 lines)
|
||||
- **No change** to tool instructions (separate optimization)
|
||||
|
||||
### Breaking Changes
|
||||
- **Yes** - Removes support for:
|
||||
- JSON `tool_calls` format in model responses
|
||||
- Function pattern format `read_file(path="test.txt")`
|
||||
|
||||
**Migration:** Models must use:
|
||||
```
|
||||
TOOL: read
|
||||
ARGUMENTS: {"filePath": "test.txt"}
|
||||
```
|
||||
|
||||
### Testing
|
||||
- Unit tests added: 9 test cases
|
||||
- Coverage: All parsing scenarios
|
||||
- All tests pass
|
||||
|
||||
## Implementation
|
||||
|
||||
```python
|
||||
# New implementation (30 lines)
|
||||
def parse_tool_calls(text: str) -> tuple:
|
||||
"""Parse tool calls using standardized format."""
|
||||
import json
|
||||
import re
|
||||
|
||||
tool_pattern = r'TOOL:\s*(\w+)\s*\nARGUMENTS:\s*(\{[^}]*\})'
|
||||
tool_matches = list(re.finditer(tool_pattern, text, re.IGNORECASE))
|
||||
|
||||
if not tool_matches:
|
||||
return text, None
|
||||
|
||||
tool_calls = []
|
||||
for i, tool_match in enumerate(tool_matches):
|
||||
tool_name = tool_match.group(1)
|
||||
args_str = tool_match.group(2)
|
||||
try:
|
||||
args_dict = json.loads(args_str)
|
||||
tool_calls.append({
|
||||
"id": f"call_{i+1}",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": tool_name,
|
||||
"arguments": json.dumps(args_dict)
|
||||
}
|
||||
})
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
|
||||
if not tool_calls:
|
||||
return text, None
|
||||
|
||||
first_start = tool_matches[0].start()
|
||||
content = text[:first_start].strip()
|
||||
|
||||
return content, tool_calls
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
Run tests:
|
||||
```bash
|
||||
python tests/test_tool_parsing.py
|
||||
```
|
||||
|
||||
Expected: 9 passed, 0 failed
|
||||
|
||||
## Follow-up
|
||||
|
||||
- [x] Update DEVELOPMENT_PATTERNS.md to mark as completed
|
||||
- [x] Add unit tests
|
||||
- [ ] Consider integration test for full tool execution flow
|
||||
@@ -1,98 +0,0 @@
|
||||
# Investigation: 31k Token Context Issue
|
||||
|
||||
## Problem
|
||||
When making requests through opencode to local_swarm, the LLM receives ~31k tokens of context even for simple empty directory queries.
|
||||
|
||||
## Root Cause Identified
|
||||
|
||||
**NOT an issue with this repo's codebase - this is expected behavior for function calling.**
|
||||
|
||||
### How it works:
|
||||
|
||||
1. **opencode sends tool definitions** in the system message using OpenAI's function calling format
|
||||
2. **Each tool definition is ~450 tokens** (name + description + parameters)
|
||||
3. **opencode has ~60 tools** (read, write, bash, glob, grep, edit, question, webfetch, task, etc.)
|
||||
4. **Total tool definition tokens:** ~27,000 tokens
|
||||
|
||||
### Calculation:
|
||||
```
|
||||
Single tool definition: ~450 tokens
|
||||
Number of tools: ~60
|
||||
Tool schemas total: ~27,000 tokens
|
||||
System message: ~500 tokens
|
||||
User query: ~100 tokens
|
||||
---
|
||||
Total: ~27,600 tokens
|
||||
```
|
||||
|
||||
**This matches the observed ~31k tokens.**
|
||||
|
||||
## Why This Happens
|
||||
|
||||
OpenAI's function calling protocol requires sending the **complete function schemas** to the LLM with every request. This is how the model:
|
||||
- Knows what tools are available
|
||||
- Understands parameter requirements
|
||||
- Knows how to format tool calls
|
||||
|
||||
All major LLM providers using function calling work this way (OpenAI, Anthropic, local models, etc.).
|
||||
|
||||
## Verification
|
||||
|
||||
```bash
|
||||
python -c "
|
||||
import tiktoken
|
||||
enc = tiktoken.get_encoding('cl100k_base')
|
||||
|
||||
# Example from actual opencode tool definition
|
||||
read_tool_schema = '''{\"type\": \"function\", \"function\": {\"name\": \"read\", \"description\": \"Read a file or directory from the local filesystem...[full description]\", \"parameters\": {...}}}'''
|
||||
|
||||
print(f'Single tool schema: {len(enc.encode(read_tool_schema))} tokens')
|
||||
print(f'Estimated 60 tools: {len(enc.encode(read_tool_schema)) * 60:,} tokens')
|
||||
"
|
||||
```
|
||||
|
||||
Result:
|
||||
- Single tool definition: ~451 tokens
|
||||
- 60 tools: ~27,060 tokens
|
||||
- Plus system + user message: ~27,660 total
|
||||
|
||||
## This Is NOT a Bug
|
||||
|
||||
The 31k token context is **correct and expected** for function calling with 60+ tools. This is how:
|
||||
- OpenAI API works
|
||||
- Claude API works
|
||||
- Local models with function calling work
|
||||
|
||||
## Potential Optimizations (Optional)
|
||||
|
||||
If reducing context size is critical, consider:
|
||||
|
||||
### Option 1: Dynamic Tool Selection
|
||||
- Only send tools relevant to current task
|
||||
- Example: For file operations, only send [read, write, glob, edit]
|
||||
- Trade-off: Requires opencode to intelligently filter tools
|
||||
|
||||
### Option 2: Compressed Tool Descriptions
|
||||
- Shorten tool descriptions to essentials
|
||||
- Example: "Read file at path (required: filePath)"
|
||||
- Trade-off: Model may make more errors with less guidance
|
||||
|
||||
### Option 3: Tool Grouping
|
||||
- Group similar tools into single "tools: [read, write, glob]" parameter
|
||||
- Trade-off: Breaks OpenAI compatibility
|
||||
|
||||
## Recommendation
|
||||
|
||||
**NO ACTION REQUIRED.** The 31k token context is:
|
||||
- Standard for function calling with many tools
|
||||
- Within capabilities of modern LLMs (32k-128k context windows)
|
||||
- Not caused by this repo's code
|
||||
|
||||
The `.opencodeignore` created earlier will help with opencode's own system prompt, but doesn't affect the LLM context sent to local_swarm.
|
||||
|
||||
## Additional Finding
|
||||
|
||||
While investigating, verified:
|
||||
- `config/prompts/tool_instructions.txt`: 125 tokens ✅
|
||||
- This repo's tool execution code: No token bloat ✅
|
||||
- Issue is purely opencode's function calling protocol ✅
|
||||
@@ -1,112 +0,0 @@
|
||||
# Test Plan: Fix Tool Execution and Token Reporting
|
||||
|
||||
## Problem Analysis
|
||||
|
||||
### Issue 1: Model Gives Instructions Instead of Executing
|
||||
**Current behavior:** Model describes what to do ("You should run mkdir...") instead of using TOOL: format
|
||||
**Expected:** Model responds with TOOL: bash\nARGUMENTS: {"command": "mkdir..."}
|
||||
|
||||
### Issue 2: Token Counting Inaccurate
|
||||
**Current:** Rough estimate `len(prompt) // 4`
|
||||
**Expected:** Accurate token count using tiktoken
|
||||
**Impact:** opencode can't properly manage context window
|
||||
|
||||
### Issue 3: npx Commands Timeout/Need Input
|
||||
**Current:** `npx create-react-app .` prompts for confirmation (y/n)
|
||||
**Expected:** Non-interactive execution or manual file creation
|
||||
**Evidence:** "Need to install the following packages: create-react-app@5.1.0 Ok to proceed? (y)"
|
||||
|
||||
## Unit Tests
|
||||
|
||||
### Test 1: Accurate Token Counting
|
||||
- [ ] Verify token count uses tiktoken (not rough estimate)
|
||||
- [ ] Test with known token counts
|
||||
- [ ] Verify prompt_tokens + completion_tokens = total_tokens
|
||||
|
||||
### Test 2: Non-Interactive Bash Commands
|
||||
- [ ] Verify npm/npx commands use --yes or equivalent flags
|
||||
- [ ] Test timeout handling for package managers
|
||||
- [ ] Verify commands don't prompt for user input
|
||||
|
||||
### Test 3: Tool Instructions Content
|
||||
- [ ] Verify instructions emphasize "EXECUTE not DESCRIBE"
|
||||
- [ ] Verify manual file creation examples (not npx)
|
||||
- [ ] Verify anti-patterns are clearly stated
|
||||
|
||||
## Integration Tests
|
||||
|
||||
### Test 4: End-to-End React Project Creation
|
||||
**Input:** "Create a React Hello World app"
|
||||
|
||||
**Expected Flow:**
|
||||
1. TOOL: bash, ARGUMENTS: {"command": "mkdir myapp"}
|
||||
2. TOOL: write, ARGUMENTS: {"filePath": "myapp/package.json", "content": "..."}
|
||||
3. TOOL: write, ARGUMENTS: {"filePath": "myapp/src/App.js", "content": "..."}
|
||||
4. Continue until complete
|
||||
|
||||
**Failure Modes:**
|
||||
- [ ] Model describes steps instead of executing
|
||||
- [ ] Uses npx create-react-app (should manually create files)
|
||||
- [ ] Stops after README only
|
||||
|
||||
### Test 5: Token Reporting Accuracy
|
||||
**Input:** Any chat completion request
|
||||
|
||||
**Expected:**
|
||||
- usage.prompt_tokens matches actual tokens
|
||||
- usage.completion_tokens matches actual tokens
|
||||
- usage.total_tokens is sum
|
||||
|
||||
**Verification:**
|
||||
- Compare tiktoken count vs API response
|
||||
|
||||
## Manual Verification
|
||||
|
||||
```bash
|
||||
# Test React creation
|
||||
python main.py --auto &
|
||||
curl -X POST http://localhost:17615/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-Client-Working-Dir: /tmp/test-project" \
|
||||
-d '{
|
||||
"model": "local-swarm",
|
||||
"messages": [{"role": "user", "content": "Create a React Hello World app"}],
|
||||
"tools": [{"type": "function", "function": {"name": "bash"}}, {"type": "function", "function": {"name": "write"}}]
|
||||
}'
|
||||
|
||||
# Check token accuracy
|
||||
curl -X POST http://localhost:17615/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "local-swarm",
|
||||
"messages": [{"role": "user", "content": "Hello"}]
|
||||
}' | jq '.usage'
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. **Execution:** 100% of requests use TOOL: format (not descriptions)
|
||||
2. **Accuracy:** Token counts match tiktoken within ±5%
|
||||
3. **Completion:** Multi-file projects fully created via write tool
|
||||
4. **No npx:** Manual file creation for React (no npx create-react-app)
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Token Counting Fix
|
||||
```python
|
||||
# Replace: prompt_tokens = len(prompt) // 4
|
||||
# With:
|
||||
import tiktoken
|
||||
encoding = tiktoken.get_encoding('cl100k_base')
|
||||
prompt_tokens = len(encoding.encode(prompt))
|
||||
completion_tokens = len(encoding.encode(content))
|
||||
```
|
||||
|
||||
### Tool Instructions Fix
|
||||
- Add explicit "DO NOT USE npx create-react-app" instruction
|
||||
- Add "EXECUTE IMMEDIATELY" mandate
|
||||
- Show complete React example with manual file creation
|
||||
|
||||
### Non-Interactive Commands
|
||||
- Auto-add --yes to npx commands
|
||||
- Or recommend manual file creation instead
|
||||
@@ -1,97 +0,0 @@
|
||||
# Test Plan: Improved Tool Instructions
|
||||
|
||||
## Problem Statement
|
||||
Model is not using tools effectively:
|
||||
1. Creates README instead of actual project structure
|
||||
2. Provides commands as text instead of executing them
|
||||
3. Refuses to run commands claiming "I am only an AI assistant"
|
||||
|
||||
## Root Cause Analysis
|
||||
Current instructions don't clearly communicate:
|
||||
- That the model SHOULD use tools proactively
|
||||
- That execution is expected, not explanation
|
||||
- The workflow: user request → tool execution → result
|
||||
|
||||
## Unit Tests (Instruction Verification)
|
||||
|
||||
### Test 1: Instruction Presence
|
||||
- [ ] Verify instructions are injected into system message
|
||||
- [ ] Verify instructions appear at the START of system message (priority position)
|
||||
|
||||
### Test 2: Token Count
|
||||
- [ ] Measure total token count of new instructions
|
||||
- [ ] Verify ≤ 500 tokens (conservative budget)
|
||||
- [ ] Document before/after
|
||||
|
||||
### Test 3: Format Compliance
|
||||
- [ ] Verify instructions include TOOL:/ARGUMENTS: format
|
||||
- [ ] Verify examples use correct format
|
||||
- [ ] Verify rules are clear and numbered
|
||||
|
||||
## Integration Tests (Behavioral)
|
||||
|
||||
### Test 4: Project Creation Flow
|
||||
**Input:** "Create a React Hello World app"
|
||||
|
||||
**Expected Behavior:**
|
||||
1. Model responds with TOOL: bash, ARGUMENTS: mkdir myapp
|
||||
2. After result, TOOL: write, ARGUMENTS: package.json content
|
||||
3. After result, TOOL: write, ARGUMENTS: src/App.js content
|
||||
4. Continue until complete project structure exists
|
||||
|
||||
**Failure Modes:**
|
||||
- [ ] Model only describes what to do
|
||||
- [ ] Model creates README only
|
||||
- [ ] Model refuses to execute commands
|
||||
|
||||
### Test 5: Multi-step Task
|
||||
**Input:** "Check what files exist, then create a test.txt file with 'hello' in it"
|
||||
|
||||
**Expected Behavior:**
|
||||
1. TOOL: bash, ARGUMENTS: ls -la
|
||||
2. Wait for result
|
||||
3. TOOL: write, ARGUMENTS: test.txt with "hello"
|
||||
|
||||
**Failure Modes:**
|
||||
- [ ] Model tries to do both in one response
|
||||
- [ ] Model doesn't wait for ls result before writing
|
||||
|
||||
### Test 6: Command Refusal
|
||||
**Input:** "Run npm install"
|
||||
|
||||
**Expected Behavior:**
|
||||
1. TOOL: bash, ARGUMENTS: npm install
|
||||
|
||||
**Failure Modes:**
|
||||
- [ ] Model responds: "I cannot run commands, I am only an AI assistant"
|
||||
- [ ] Model explains npm install instead of running it
|
||||
|
||||
## Manual Verification Commands
|
||||
|
||||
```bash
|
||||
# Start the server
|
||||
python main.py --auto
|
||||
|
||||
# In another terminal, test with curl
|
||||
curl -X POST http://localhost:17615/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "local-swarm",
|
||||
"messages": [{"role": "user", "content": "Create a React Hello World app"}],
|
||||
"tools": [{"type": "function", "function": {"name": "bash", "description": "Run shell commands"}}, {"type": "function", "function": {"name": "write", "description": "Write files"}}]
|
||||
}'
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. **Proactivity:** Model uses tools without being asked twice
|
||||
2. **Execution:** Model runs commands, doesn't just describe them
|
||||
3. **No Refusal:** Model never says "I cannot" or "I am only an AI"
|
||||
4. **Completeness:** Multi-file projects are fully created via tools
|
||||
5. **Format:** 100% of tool calls use correct TOOL:/ARGUMENTS: format
|
||||
|
||||
## Metrics
|
||||
|
||||
- **Tool usage rate:** % of requests that result in tool calls
|
||||
- **Format compliance:** % of tool calls in correct format
|
||||
- **Completion rate:** % of multi-step tasks fully completed
|
||||
@@ -1,35 +0,0 @@
|
||||
# Test Plan: Tool Parsing Simplification
|
||||
|
||||
## Unit Tests
|
||||
|
||||
- [x] Test case 1: Single tool call → Returns 1 tool with correct name and arguments
|
||||
- [x] Test case 2: No tool in text → Returns None for tools, original text as content
|
||||
- [x] Test case 3: Multiple tools → Returns all tools in order
|
||||
- [x] Test case 4: Content before tool → Content extracted, tool parsed correctly
|
||||
- [x] Test case 5: Bash tool → Correctly parses bash command
|
||||
- [x] Test case 6: Case insensitive → "tool:" and "TOOL:" both work
|
||||
- [x] Test case 7: Invalid JSON → Skips invalid, continues with valid
|
||||
- [x] Test case 8: Empty text → Returns None, empty string
|
||||
- [x] Test case 9: Whitespace only → Returns None
|
||||
|
||||
## Integration Tests
|
||||
|
||||
- [ ] End-to-end flow:
|
||||
1. Send chat completion request with tools
|
||||
2. Model responds with TOOL:/ARGUMENTS: format
|
||||
3. Parser extracts tool call
|
||||
4. Tool executes
|
||||
5. Result returned in response
|
||||
|
||||
- [ ] Expected result: Tool executes successfully, result included in response
|
||||
|
||||
## Manual Verification
|
||||
|
||||
- [ ] Command: `python tests/test_tool_parsing.py`
|
||||
- [ ] Expected output: "9 passed, 0 failed"
|
||||
|
||||
## Token Budget Verification
|
||||
|
||||
- Parser code: ~30 lines (~200 tokens)
|
||||
- Well under 2000 token limit
|
||||
- Simple regex pattern maintains low complexity
|
||||
+451
-56
@@ -7,7 +7,7 @@ import json
|
||||
import logging
|
||||
import time
|
||||
import uuid
|
||||
from typing import Optional
|
||||
from typing import Optional, List
|
||||
|
||||
from api.models import (
|
||||
ChatCompletionRequest,
|
||||
@@ -20,11 +20,54 @@ from api.formatting import format_messages_with_tools
|
||||
from api.tool_parser import parse_tool_calls
|
||||
from utils.token_counter import count_tokens
|
||||
from tools.executor import get_tool_executor
|
||||
from chatlog import get_chat_logger
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _extract_working_dir_from_prompt(prompt: str) -> Optional[str]:
|
||||
"""Extract working directory from user prompt.
|
||||
|
||||
Looks for patterns like:
|
||||
- "in the /path/to/dir directory"
|
||||
- "in directory /path/to/dir"
|
||||
- "in /path/to/dir"
|
||||
- "under /path/to/dir"
|
||||
- "from /path/to/dir"
|
||||
|
||||
Args:
|
||||
prompt: User prompt text
|
||||
|
||||
Returns:
|
||||
Extracted directory path or None
|
||||
"""
|
||||
import re
|
||||
import os
|
||||
|
||||
# Common patterns for directory mentions
|
||||
patterns = [
|
||||
r'in the\s+([/~]?[\w\-/.]+)\s+(?:directory|folder|dir)',
|
||||
r'in\s+(?:directory|folder|dir)\s+([/~]?[\w\-/.]+)',
|
||||
r'(?:in|under|from|at)\s+([/~]?[\w\-/.]{3,})', # At least 3 chars to avoid "in a"
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
match = re.search(pattern, prompt, re.IGNORECASE)
|
||||
if match:
|
||||
path = match.group(1)
|
||||
# Validate it looks like a path
|
||||
if path.startswith('/') or path.startswith('~') or '/' in path:
|
||||
# Expand home directory
|
||||
if path.startswith('~'):
|
||||
path = os.path.expanduser(path)
|
||||
# Check if it's a valid directory or parent exists
|
||||
if os.path.isdir(path) or os.path.isdir(os.path.dirname(path)):
|
||||
return os.path.abspath(path)
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _sanitize_tools(tools: Optional[list]) -> Optional[list]:
|
||||
"""Sanitize tool definitions to fix invalid schemas.
|
||||
|
||||
@@ -61,19 +104,19 @@ async def _execute_tools(
|
||||
tool_calls: list,
|
||||
client_working_dir: Optional[str],
|
||||
executor
|
||||
) -> str:
|
||||
) -> List[tuple]:
|
||||
"""Execute tool calls and return results.
|
||||
|
||||
|
||||
Args:
|
||||
tool_calls: List of parsed tool calls
|
||||
client_working_dir: Working directory for file operations
|
||||
executor: Tool executor instance
|
||||
|
||||
|
||||
Returns:
|
||||
Combined tool results as string
|
||||
List of tuples (tool_name, result_string)
|
||||
"""
|
||||
from api.routes import execute_tool_server_side
|
||||
|
||||
|
||||
tool_results = []
|
||||
for i, tc in enumerate(tool_calls):
|
||||
tool_name = tc.get("function", {}).get("name", "")
|
||||
@@ -85,10 +128,10 @@ async def _execute_tools(
|
||||
|
||||
logger.debug(f" [{i+1}/{len(tool_calls)}] Executing: {tool_name}({tool_args})")
|
||||
result = await execute_tool_server_side(tool_name, tool_args, working_dir=client_working_dir)
|
||||
tool_results.append(f"Tool '{tool_name}' result: {result}")
|
||||
tool_results.append((tool_name, result))
|
||||
logger.debug(f" ✓ Completed: {result[:100]}..." if len(result) > 100 else f" ✓ Result: {result}")
|
||||
|
||||
return "\n\n".join(tool_results)
|
||||
return tool_results
|
||||
|
||||
|
||||
def _create_response(
|
||||
@@ -97,10 +140,25 @@ def _create_response(
|
||||
finish_reason: str,
|
||||
prompt: str,
|
||||
request: ChatCompletionRequest,
|
||||
swarm_manager=None
|
||||
swarm_manager=None,
|
||||
thinking_content: Optional[str] = None
|
||||
) -> ChatCompletionResponse:
|
||||
"""Create a chat completion response.
|
||||
|
||||
Args:
|
||||
content: Final response content (after tool execution if any)
|
||||
tool_calls: List of tool calls
|
||||
finish_reason: Finish reason
|
||||
prompt: Original prompt for token counting
|
||||
request: Original request
|
||||
swarm_manager: Swarm manager instance (optional, for getting model name)
|
||||
thinking_content: Intermediate thinking/planning content to include in streaming as reasoning_content
|
||||
|
||||
Returns:
|
||||
ChatCompletionResponse
|
||||
"""
|
||||
"""Create a chat completion response.
|
||||
|
||||
Args:
|
||||
content: Response content
|
||||
tool_calls: List of tool calls
|
||||
@@ -141,7 +199,7 @@ def _create_response(
|
||||
|
||||
message = ChatMessage(**message_kwargs)
|
||||
|
||||
return ChatCompletionResponse(
|
||||
response = ChatCompletionResponse(
|
||||
id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
|
||||
created=int(time.time()),
|
||||
model=model_name,
|
||||
@@ -162,26 +220,56 @@ def _create_response(
|
||||
system_fingerprint=system_fingerprint
|
||||
)
|
||||
|
||||
# Attach thinking content for streaming (not part of JSON serialization)
|
||||
# Use a private attribute to avoid interfering with model serialization
|
||||
if thinking_content is not None:
|
||||
setattr(response, '_thinking', thinking_content)
|
||||
|
||||
async def _generate_with_local_swarm(
|
||||
swarm_manager,
|
||||
return response
|
||||
|
||||
|
||||
async def _generate_with_consensus(
|
||||
prompt: str,
|
||||
max_tokens: int,
|
||||
temperature: float,
|
||||
stream: bool = False
|
||||
swarm_manager,
|
||||
federated_swarm=None
|
||||
) -> tuple[str, int, float]:
|
||||
"""Generate response using local swarm.
|
||||
"""Generate response with consensus (local or federated).
|
||||
|
||||
This is the unified generation interface - it handles both local-only
|
||||
and federated generation transparently. Callers don't need to know
|
||||
which mode is being used.
|
||||
|
||||
Args:
|
||||
swarm_manager: Swarm manager instance
|
||||
prompt: Prompt to generate from
|
||||
max_tokens: Maximum tokens to generate
|
||||
temperature: Sampling temperature
|
||||
stream: Whether this is a streaming request
|
||||
swarm_manager: Local swarm manager instance
|
||||
federated_swarm: Optional federated swarm for multi-node consensus
|
||||
|
||||
Returns:
|
||||
Tuple of (response_text, tokens_generated, tokens_per_second)
|
||||
"""
|
||||
# Check if federation is available
|
||||
if federated_swarm is not None:
|
||||
peers = federated_swarm.discovery.get_peers()
|
||||
if peers:
|
||||
logger.debug(f"🌐 Using federation with {len(peers)} peer(s)")
|
||||
try:
|
||||
fed_result = await federated_swarm.generate_with_federation(
|
||||
prompt=prompt,
|
||||
max_tokens=max_tokens,
|
||||
temperature=temperature
|
||||
)
|
||||
# Federation returns FederationResult object
|
||||
# Extract the final response text
|
||||
return fed_result.final_response, 0, 0.0 # Tokens/TPS not tracked in federation mode
|
||||
except Exception as e:
|
||||
logger.warning(f"Federation failed, falling back to local: {e}")
|
||||
# Fall through to local generation
|
||||
|
||||
# Local generation (fallback or no federation)
|
||||
try:
|
||||
result = await swarm_manager.generate(
|
||||
prompt=prompt,
|
||||
@@ -189,18 +277,178 @@ async def _generate_with_local_swarm(
|
||||
temperature=temperature,
|
||||
use_consensus=True
|
||||
)
|
||||
|
||||
response = result.selected_response
|
||||
return (
|
||||
response.text,
|
||||
response.tokens_generated,
|
||||
response.tokens_per_second
|
||||
)
|
||||
return response.text, response.tokens_generated, response.tokens_per_second
|
||||
except Exception as e:
|
||||
logger.exception("Error in swarm generation")
|
||||
raise
|
||||
|
||||
|
||||
def _tool_calls_agree(tool_calls_list: List[List[dict]]) -> bool:
|
||||
"""Check if all workers agree on the same tool calls.
|
||||
|
||||
Args:
|
||||
tool_calls_list: List of tool calls from each worker
|
||||
|
||||
Returns:
|
||||
True if all workers have the same tool calls
|
||||
"""
|
||||
if not tool_calls_list:
|
||||
return True
|
||||
|
||||
# Check if all have the same number of tool calls
|
||||
first_count = len(tool_calls_list[0])
|
||||
if not all(len(tc) == first_count for tc in tool_calls_list):
|
||||
logger.warning(f" ⚠️ Workers disagree on number of tool calls: {[len(tc) for tc in tool_calls_list]}")
|
||||
return False
|
||||
|
||||
if first_count == 0:
|
||||
return True # All agree on no tools
|
||||
|
||||
# Check if tool names and arguments match
|
||||
for i in range(first_count):
|
||||
first_tool = tool_calls_list[0][i]
|
||||
first_name = first_tool.get("function", {}).get("name", "")
|
||||
first_args = first_tool.get("function", {}).get("arguments", "")
|
||||
|
||||
for j, other_calls in enumerate(tool_calls_list[1:], 1):
|
||||
other_tool = other_calls[i]
|
||||
other_name = other_tool.get("function", {}).get("name", "")
|
||||
other_args = other_tool.get("function", {}).get("arguments", "")
|
||||
|
||||
if first_name != other_name:
|
||||
logger.warning(f" ⚠️ Worker {j+1} disagrees on tool name: {first_name} vs {other_name}")
|
||||
return False
|
||||
|
||||
# For arguments, do a loose comparison (ignore whitespace differences)
|
||||
try:
|
||||
first_args_norm = json.loads(first_args) if isinstance(first_args, str) else first_args
|
||||
other_args_norm = json.loads(other_args) if isinstance(other_args, str) else other_args
|
||||
if first_args_norm != other_args_norm:
|
||||
logger.warning(f" ⚠️ Worker {j+1} disagrees on arguments for {first_name}")
|
||||
return False
|
||||
except json.JSONDecodeError:
|
||||
# If JSON parsing fails, compare as strings
|
||||
if str(first_args).strip() != str(other_args).strip():
|
||||
logger.warning(f" ⚠️ Worker {j+1} disagrees on arguments for {first_name}")
|
||||
return False
|
||||
|
||||
logger.info(f" ✅ All {len(tool_calls_list)} workers agree on tool calls")
|
||||
return True
|
||||
|
||||
|
||||
async def _generate_with_tool_consensus(
|
||||
swarm_manager,
|
||||
prompt: str,
|
||||
max_tokens: int,
|
||||
temperature: float
|
||||
) -> tuple[str, List[dict], int, float]:
|
||||
"""Generate response with tool call consensus checking.
|
||||
|
||||
When multiple workers are active, this ensures they all agree on tool calls
|
||||
before executing them. If they disagree, returns the best response without tools.
|
||||
|
||||
Args:
|
||||
swarm_manager: Swarm manager instance
|
||||
prompt: Prompt to generate from
|
||||
max_tokens: Maximum tokens to generate
|
||||
temperature: Sampling temperature
|
||||
|
||||
Returns:
|
||||
Tuple of (response_text, tool_calls, tokens_generated, tps)
|
||||
"""
|
||||
try:
|
||||
# Get status to check number of workers
|
||||
status = swarm_manager.get_status()
|
||||
num_workers = getattr(status, 'active_workers', 1)
|
||||
|
||||
# If only one worker, use normal generation
|
||||
if num_workers <= 1:
|
||||
logger.debug(" Single worker mode - skipping tool consensus")
|
||||
result = await swarm_manager.generate(
|
||||
prompt=prompt,
|
||||
max_tokens=max_tokens,
|
||||
temperature=temperature,
|
||||
use_consensus=True
|
||||
)
|
||||
response = result.selected_response
|
||||
parsed_content, tool_calls = parse_tool_calls(response.text)
|
||||
return response.text, tool_calls, response.tokens_generated, response.tokens_per_second
|
||||
|
||||
# Multiple workers - check for tool consensus
|
||||
logger.info(f" 🔍 Checking tool consensus across {num_workers} workers...")
|
||||
|
||||
# Generate from all workers individually
|
||||
from swarm.manager import GenerationRequest
|
||||
all_responses = []
|
||||
all_tool_calls = []
|
||||
|
||||
# Get all active workers
|
||||
workers = swarm_manager.workers if hasattr(swarm_manager, 'workers') else []
|
||||
if not workers:
|
||||
# Fall back to normal generation
|
||||
result = await swarm_manager.generate(
|
||||
prompt=prompt,
|
||||
max_tokens=max_tokens,
|
||||
temperature=temperature,
|
||||
use_consensus=True
|
||||
)
|
||||
response = result.selected_response
|
||||
parsed_content, tool_calls = parse_tool_calls(response.text)
|
||||
return response.text, tool_calls, response.tokens_generated, response.tokens_per_second
|
||||
|
||||
# Generate from each worker
|
||||
for i, worker in enumerate(workers):
|
||||
try:
|
||||
gen_result = await worker.generate(
|
||||
GenerationRequest(prompt=prompt, max_tokens=max_tokens, temperature=temperature)
|
||||
)
|
||||
response_text = gen_result.text
|
||||
parsed_content, tool_calls = parse_tool_calls(response_text)
|
||||
all_responses.append(response_text)
|
||||
all_tool_calls.append(tool_calls)
|
||||
logger.debug(f" Worker {i+1}: {len(tool_calls)} tool call(s)")
|
||||
except Exception as e:
|
||||
logger.warning(f" Worker {i+1} failed: {e}")
|
||||
all_responses.append("")
|
||||
all_tool_calls.append([])
|
||||
|
||||
# Check consensus
|
||||
if _tool_calls_agree(all_tool_calls):
|
||||
# All agree - use the first response's tool calls
|
||||
best_response = all_responses[0] if all_responses else ""
|
||||
best_tool_calls = all_tool_calls[0] if all_tool_calls else []
|
||||
total_tokens = sum(len(r.split()) for r in all_responses if r) // len([r for r in all_responses if r])
|
||||
avg_tps = 10.0 # Estimate
|
||||
return best_response, best_tool_calls, total_tokens, avg_tps
|
||||
else:
|
||||
# Disagreement - fall back to consensus strategy without tools
|
||||
logger.warning(" ⚠️ Tool consensus failed - falling back to text response")
|
||||
result = await swarm_manager.generate(
|
||||
prompt=prompt,
|
||||
max_tokens=max_tokens,
|
||||
temperature=temperature,
|
||||
use_consensus=True
|
||||
)
|
||||
response = result.selected_response
|
||||
# Strip any tool calls to be safe
|
||||
parsed_content, _ = parse_tool_calls(response.text)
|
||||
return parsed_content, [], response.tokens_generated, response.tokens_per_second
|
||||
|
||||
except Exception as e:
|
||||
logger.exception("Error in tool consensus generation")
|
||||
# Fall back to normal generation
|
||||
result = await swarm_manager.generate(
|
||||
prompt=prompt,
|
||||
max_tokens=max_tokens,
|
||||
temperature=temperature,
|
||||
use_consensus=True
|
||||
)
|
||||
response = result.selected_response
|
||||
parsed_content, tool_calls = parse_tool_calls(response.text)
|
||||
return response.text, tool_calls, response.tokens_generated, response.tokens_per_second
|
||||
|
||||
|
||||
async def _generate_with_federation(
|
||||
federated_swarm,
|
||||
prompt: str,
|
||||
@@ -263,6 +511,29 @@ async def handle_chat_completion(
|
||||
prompt = format_messages_with_tools(request.messages, None)
|
||||
has_tools = request.tools is not None and len(request.tools) > 0
|
||||
|
||||
# Initialize chat logger (if enabled via LOCAL_SWARM_CHATLOG=1)
|
||||
chat_logger = get_chat_logger()
|
||||
|
||||
# Extract working directory from prompt if not provided by client
|
||||
if client_working_dir is None:
|
||||
# Try to extract from user messages
|
||||
for msg in reversed(request.messages):
|
||||
if msg.role == 'user':
|
||||
extracted_dir = _extract_working_dir_from_prompt(msg.content)
|
||||
if extracted_dir:
|
||||
client_working_dir = extracted_dir
|
||||
logger.info(f"📁 Extracted working directory from prompt: {client_working_dir}")
|
||||
break
|
||||
|
||||
# Log initial conversation history to chatlog
|
||||
for msg in request.messages:
|
||||
if msg.role == 'user':
|
||||
chat_logger.log_user_message(msg.content)
|
||||
elif msg.role == 'assistant':
|
||||
chat_logger.log_assistant_message(msg.content, has_tool_calls=bool(msg.tool_calls))
|
||||
elif msg.role == 'tool':
|
||||
chat_logger.log_tool_result("tool", msg.content)
|
||||
|
||||
logger.info(f"\n{'='*60}")
|
||||
logger.info(f"CHAT COMPLETION REQUEST:")
|
||||
logger.info(f" has_tools={has_tools}, stream={request.stream}")
|
||||
@@ -270,21 +541,18 @@ async def handle_chat_completion(
|
||||
logger.info(f" messages={len(request.messages)}")
|
||||
logger.info(f"{'='*60}")
|
||||
|
||||
# Use federation if available
|
||||
if federated_swarm is not None:
|
||||
peers = federated_swarm.discovery.get_peers()
|
||||
if peers:
|
||||
logger.info(f"🌐 Using federation with {len(peers)} peer(s)...")
|
||||
content, tool_calls, finish_reason = await _generate_with_federation(
|
||||
federated_swarm, prompt, request.max_tokens or 1024, request.temperature or 0.7
|
||||
)
|
||||
return _create_response(content, tool_calls, finish_reason, prompt, request, swarm_manager)
|
||||
|
||||
|
||||
|
||||
# Build conversation history
|
||||
messages = list(request.messages)
|
||||
|
||||
# Determine if we should use federation for generation
|
||||
use_federation = federated_swarm is not None and len(federated_swarm.discovery.get_peers()) > 0
|
||||
if use_federation:
|
||||
logger.info(f"🌐 Federation available with peers")
|
||||
|
||||
# Track thinking content for streaming (OpenCode reasoning_content)
|
||||
thinking_content: Optional[str] = None
|
||||
thinking_captured = False
|
||||
|
||||
# Initialize iteration counter and response text
|
||||
iteration = 0
|
||||
max_iterations = 3
|
||||
@@ -295,10 +563,31 @@ async def handle_chat_completion(
|
||||
logger.info(f"--- Tool Execution Iteration {iteration} ---")
|
||||
|
||||
# Generate response
|
||||
logger.debug(f"Generating response...")
|
||||
response_text, tokens_generated, tps = await _generate_with_local_swarm(
|
||||
swarm_manager, prompt, request.max_tokens or 1024, request.temperature or 0.7
|
||||
)
|
||||
# IMPORTANT: Only use federation on FIRST iteration (initial planning)
|
||||
# Subsequent iterations process tool results which only head node has
|
||||
if iteration == 1 and use_federation:
|
||||
# First iteration: use federation for consensus on initial plan
|
||||
logger.info(f"🌐 Using federation for initial generation...")
|
||||
response_text, tokens_generated, tps = await _generate_with_consensus(
|
||||
prompt=prompt,
|
||||
max_tokens=request.max_tokens or 1024,
|
||||
temperature=request.temperature or 0.7,
|
||||
swarm_manager=swarm_manager,
|
||||
federated_swarm=federated_swarm
|
||||
)
|
||||
else:
|
||||
# Subsequent iterations: LOCAL ONLY
|
||||
# Peers don't have tool results from previous iterations
|
||||
# Using federation here would cause inconsistent context
|
||||
if iteration > 1:
|
||||
logger.debug(f"Using local generation (iteration {iteration}, tool context local only)")
|
||||
response_text, tokens_generated, tps = await _generate_with_consensus(
|
||||
prompt=prompt,
|
||||
max_tokens=request.max_tokens or 1024,
|
||||
temperature=request.temperature or 0.7,
|
||||
swarm_manager=swarm_manager,
|
||||
federated_swarm=None # Force local-only
|
||||
)
|
||||
|
||||
logger.info(f"Generated response ({len(response_text)} chars, {tokens_generated} tokens)")
|
||||
logger.debug(f"Response: {response_text[:200]}...")
|
||||
@@ -306,10 +595,30 @@ async def handle_chat_completion(
|
||||
# Check for tool calls
|
||||
parsed_content, tool_calls_parsed = parse_tool_calls(response_text)
|
||||
|
||||
# Log assistant response to chatlog
|
||||
chat_logger.log_assistant_message(response_text, has_tool_calls=bool(tool_calls_parsed))
|
||||
|
||||
if tool_calls_parsed:
|
||||
# Log each tool call
|
||||
for i, tc in enumerate(tool_calls_parsed, 1):
|
||||
tool_name = tc.get("function", {}).get("name", "")
|
||||
args_str = tc.get("function", {}).get("arguments", "{}")
|
||||
try:
|
||||
args_dict = json.loads(args_str) if isinstance(args_str, str) else args_str
|
||||
except json.JSONDecodeError:
|
||||
args_dict = {"raw": args_str}
|
||||
chat_logger.log_tool_call(tool_name, args_dict, i)
|
||||
|
||||
# Capture thinking for OpenCode streaming (first occurrence only)
|
||||
if not thinking_captured:
|
||||
# Use the parsed content (without tool calls) as the reasoning
|
||||
thinking_content = parsed_content or ""
|
||||
thinking_captured = True
|
||||
|
||||
if not tool_calls_parsed:
|
||||
# No more tools - this is the final answer
|
||||
logger.info(f"✅ Final answer (no tools) after {iteration} iteration(s)")
|
||||
return _create_response(parsed_content, [], "stop", prompt, request, swarm_manager)
|
||||
return _create_response(parsed_content, [], "stop", prompt, request, swarm_manager, thinking_content)
|
||||
|
||||
# Tools detected - execute them
|
||||
logger.info(f"🔧 Found {len(tool_calls_parsed)} tool call(s)")
|
||||
@@ -318,22 +627,73 @@ async def handle_chat_completion(
|
||||
args_str = tc.get("function", {}).get("arguments", "{}")
|
||||
logger.info(f" [{i+1}] {tool_name}: {args_str[:100]}...")
|
||||
|
||||
# Add assistant message to history
|
||||
messages.append(ChatMessage(role="assistant", content=response_text))
|
||||
# Add assistant message to history with tool_calls (if any)
|
||||
# This preserves the tool call IDs for proper tool message association
|
||||
assistant_message = ChatMessage(
|
||||
role="assistant",
|
||||
content=response_text
|
||||
)
|
||||
if tool_calls_parsed:
|
||||
# Convert tool calls to proper ToolCall objects with IDs
|
||||
from api.models import ToolCall
|
||||
tc_objects = []
|
||||
for i, tc_dict in enumerate(tool_calls_parsed):
|
||||
tc_id = tc_dict.get("id", f"call_{i}")
|
||||
tc_objects.append(ToolCall(
|
||||
id=tc_id,
|
||||
type="function",
|
||||
function={
|
||||
"name": tc_dict["function"]["name"],
|
||||
"arguments": tc_dict["function"]["arguments"]
|
||||
}
|
||||
))
|
||||
assistant_message.tool_calls = tc_objects
|
||||
|
||||
messages.append(assistant_message)
|
||||
|
||||
# Execute all tools
|
||||
logger.info(f"⏱️ Executing tools...")
|
||||
tool_results_str = await _execute_tools(tool_calls_parsed, client_working_dir, get_tool_executor())
|
||||
tool_results = await _execute_tools(tool_calls_parsed, client_working_dir, get_tool_executor())
|
||||
|
||||
# Add tool result to history with STOP instruction
|
||||
# The model needs to be told explicitly to STOP calling tools
|
||||
tool_result_with_instruction = (
|
||||
f"{tool_results_str}\n\n"
|
||||
f"IMPORTANT: You have received the tool result above. "
|
||||
f"DO NOT call any more tools. Provide your final answer now."
|
||||
)
|
||||
messages.append(ChatMessage(role="tool", content=tool_result_with_instruction))
|
||||
logger.info(f"✅ Tools executed ({len(tool_results_str)} chars)")
|
||||
# Log tool results to chatlog (single combined log for debugging)
|
||||
combined_strings = [f"Tool {i+1} ({name}): {result}" for i, (name, result) in enumerate(tool_results)]
|
||||
chat_logger.log_tool_result("combined", "\n\n".join(combined_strings), success=True)
|
||||
|
||||
# Add tool result to history - one message per tool call with proper tool_call_id
|
||||
for i, ((tool_name, tool_result), tc) in enumerate(zip(tool_results, tool_calls_parsed)):
|
||||
tool_call_id = tc.get("id", f"call_{i}")
|
||||
|
||||
# Format the tool result message with explicit instructions
|
||||
# This tells the model exactly what to do with the result
|
||||
if tool_name == "read":
|
||||
instruction = "The file contents are shown above. READ THIS FILE CONTENT ALOUD to the user. Do not call additional tools."
|
||||
elif tool_name == "write":
|
||||
instruction = "The file has been successfully written. CONFIRM to the user that the file was created with the content shown above. Do not call additional tools."
|
||||
elif tool_name == "bash":
|
||||
# Check if this was a verification command (ls, grep) vs an action command
|
||||
if "ls" in tool_result.lower() or "grep" in tool_result.lower():
|
||||
instruction = "CRITICAL: The listing is shown above. If the user asked to READ a specific file and you can see it exists in this listing, you MUST immediately USE THE read TOOL NOW with the exact filename from the listing. Do not summarize first - READ THE FILE immediately. Use the filename exactly as shown (e.g., 'my-secret.log' not '/path/to/my-secret.log'). If the user asked to just CHECK what files exist (without reading), then summarize. If the requested file is NOT in the listing, tell the user it doesn't exist."
|
||||
else:
|
||||
instruction = "The command has been executed. SUMMARIZE the output above to answer the user's request. Do not call additional tools."
|
||||
else:
|
||||
instruction = "The tool has completed. Use the result shown above to answer the user's request. Do not call additional tools."
|
||||
|
||||
tool_message_content = (
|
||||
f"Tool Result ({tool_name}):\n"
|
||||
f"{tool_result}\n\n"
|
||||
f"INSTRUCTION: {instruction}"
|
||||
)
|
||||
|
||||
messages.append(ChatMessage(
|
||||
role="tool",
|
||||
content=tool_message_content,
|
||||
tool_call_id=tool_call_id,
|
||||
name=tool_name
|
||||
))
|
||||
|
||||
logger.info(f" ✓ Tool result {i+1} added to history (tool_call_id={tool_call_id}, name={tool_name})")
|
||||
|
||||
logger.info(f"✅ Tools executed ({len(tool_results)} results)")
|
||||
|
||||
# Continue loop - generate response with tool results
|
||||
logger.info(f"🔄 Generating response with tool results...")
|
||||
@@ -341,20 +701,55 @@ async def handle_chat_completion(
|
||||
# Format with tool results (but DON'T include tool instruction - model should just use results)
|
||||
next_prompt = format_messages_with_tools(messages, None if use_opencode_tools else request.tools)
|
||||
|
||||
response_text, tokens_generated, tps = await _generate_with_local_swarm(
|
||||
swarm_manager, next_prompt, request.max_tokens or 1024, request.temperature or 0.7
|
||||
logger.info(f"📤 Prompt sent to model after tool execution:")
|
||||
logger.info(f" Total tokens: {count_tokens(next_prompt)}")
|
||||
logger.info(f" Messages in history: {len(messages)}")
|
||||
for i, msg in enumerate(messages):
|
||||
logger.info(f" [{i}] {msg.role}: {msg.content[:100]}{'...' if len(msg.content) > 100 else ''}")
|
||||
if msg.tool_calls:
|
||||
for j, tc in enumerate(msg.tool_calls):
|
||||
logger.info(f" Tool call {j}: {tc.function.get('name')} ({tc.function.get('arguments')})")
|
||||
if msg.tool_call_id:
|
||||
logger.info(f" (tool_call_id: {msg.tool_call_id}, name: {msg.name})")
|
||||
logger.debug(f"Full prompt:\n{next_prompt[:1000]}...")
|
||||
|
||||
response_text, tokens_generated, tps = await _generate_with_consensus(
|
||||
prompt=next_prompt,
|
||||
max_tokens=request.max_tokens or 1024,
|
||||
temperature=request.temperature or 0.7,
|
||||
swarm_manager=swarm_manager,
|
||||
federated_swarm=None # Tool result processing is local-only
|
||||
)
|
||||
|
||||
logger.info(f"Generated with tool results ({len(response_text)} chars, {tokens_generated} tokens)")
|
||||
logger.info(f"✅ Generated with tool results ({len(response_text)} chars, {tokens_generated} tokens)")
|
||||
logger.debug(f"Response: {response_text[:200]}...")
|
||||
|
||||
# Check for more tools in the new response
|
||||
parsed_content, tool_calls_parsed = parse_tool_calls(response_text)
|
||||
|
||||
# Log assistant response to chatlog
|
||||
chat_logger.log_assistant_message(response_text, has_tool_calls=bool(tool_calls_parsed))
|
||||
|
||||
if tool_calls_parsed:
|
||||
# Log each tool call
|
||||
for i, tc in enumerate(tool_calls_parsed, 1):
|
||||
tool_name = tc.get("function", {}).get("name", "")
|
||||
args_str = tc.get("function", {}).get("arguments", "{}")
|
||||
try:
|
||||
args_dict = json.loads(args_str) if isinstance(args_str, str) else args_str
|
||||
except json.JSONDecodeError:
|
||||
args_dict = {"raw": args_str}
|
||||
chat_logger.log_tool_call(tool_name, args_dict, i)
|
||||
|
||||
# Capture thinking if not already captured
|
||||
if not thinking_captured:
|
||||
thinking_content = parsed_content or ""
|
||||
thinking_captured = True
|
||||
|
||||
if not tool_calls_parsed:
|
||||
# No more tools - final answer
|
||||
logger.info(f"✅ Final answer (after tool execution) after {iteration} iteration(s)")
|
||||
return _create_response(parsed_content, [], "stop", prompt, request, swarm_manager)
|
||||
return _create_response(parsed_content, [], "stop", prompt, request, swarm_manager, thinking_content)
|
||||
|
||||
# More tools detected - continue loop
|
||||
logger.info(f"🔧 More tools found - continuing loop")
|
||||
@@ -362,4 +757,4 @@ async def handle_chat_completion(
|
||||
# Max iterations reached - force return last response
|
||||
logger.warning(f"⚠️ Max tool iterations ({max_iterations}) reached")
|
||||
logger.warning(f"⚠️ Returning last response (may include incomplete tool call)")
|
||||
return _create_response(response_text, [], "stop", prompt, request, swarm_manager)
|
||||
return _create_response(response_text, [], "stop", prompt, request, swarm_manager, thinking_content)
|
||||
|
||||
+13
-7
@@ -153,7 +153,13 @@ def _filter_messages(messages: List[ChatMessage]) -> List[ChatMessage]:
|
||||
|
||||
|
||||
def _add_tool_instructions(messages: List[ChatMessage]) -> List[ChatMessage]:
|
||||
"""Add tool instructions to messages if needed.
|
||||
"""Add tool instructions to the beginning of messages.
|
||||
|
||||
Tool instructions are now ALWAYS injected by default so any client
|
||||
(Continue, hollama, etc.) can use tools without requiring client-side
|
||||
tool instruction injection.
|
||||
|
||||
TODO: Add a "plan mode" that disables tool use for planning-only conversations.
|
||||
|
||||
Args:
|
||||
messages: List of chat messages
|
||||
@@ -161,13 +167,13 @@ def _add_tool_instructions(messages: List[ChatMessage]) -> List[ChatMessage]:
|
||||
Returns:
|
||||
Messages with tool instructions added
|
||||
"""
|
||||
has_assistant = any(msg.role == "assistant" for msg in messages)
|
||||
|
||||
if has_assistant:
|
||||
return messages
|
||||
|
||||
tool_instructions = _load_tool_instructions()
|
||||
logger.debug(f"Using {'opencode' if _USE_OPENCODE_TOOLS else 'local'} tool mode: {len(tool_instructions)} chars")
|
||||
logger.debug(f"Injecting tool instructions: {len(tool_instructions)} chars")
|
||||
|
||||
# Check if instructions already present (avoid duplication)
|
||||
if messages and messages[0].role == "system" and "AVAILABLE TOOLS" in messages[0].content:
|
||||
logger.debug("Tool instructions already present, skipping injection")
|
||||
return messages
|
||||
|
||||
return [ChatMessage(role="system", content=tool_instructions)] + messages
|
||||
|
||||
|
||||
+3
-3
@@ -29,11 +29,11 @@ class ToolCall(BaseModel):
|
||||
|
||||
class ChatMessage(BaseModel):
|
||||
"""A chat message."""
|
||||
role: Literal["system", "user", "assistant", "tool"] = Field(..., description="Role of message sender")
|
||||
role: Literal["system", "user", "assistant", "tool"] = Field(..., description="Role of the message sender")
|
||||
content: str = Field(default="", description="Message content")
|
||||
tool_calls: Optional[List[ToolCall]] = Field(default=None, description="Tool calls from assistant")
|
||||
#tool_call_id: Optional[str] = Field(default=None, description="ID of tool call this message is responding to")
|
||||
#name: Optional[str] = Field(default=None, description="Name of the tool/function")
|
||||
tool_call_id: Optional[str] = Field(default=None, description="ID of tool call this message is responding to")
|
||||
name: Optional[str] = Field(default=None, description="Name of the tool/function")
|
||||
|
||||
model_config = ConfigDict(
|
||||
# Use Pydantic's exclude_none to omit tool_calls when None
|
||||
|
||||
+221
-23
@@ -225,41 +225,128 @@ def set_federated_swarm(swarm):
|
||||
|
||||
|
||||
async def _stream_response(response: ChatCompletionResponse):
|
||||
"""Stream a chat completion response as Server-Sent Events.
|
||||
"""Stream a chat completion response as Server-Sent Events using OpenCode-compatible format.
|
||||
|
||||
For compatibility with OpenAI format, we use delta format for streaming.
|
||||
The response is sent as a single chunk since we don't support
|
||||
true token-by-token streaming yet.
|
||||
This implementation matches the Vercel AI SDK OpenAI-compatible format:
|
||||
- Uses reasoning_content for thinking/planning (before tool calls)
|
||||
- Properly streams tool_calls with incremental arguments
|
||||
- Eventually switches to content for final answer
|
||||
"""
|
||||
import json
|
||||
from api.models import ChatCompletionStreamResponse, ChatCompletionStreamChoice
|
||||
|
||||
# Convert to streaming format with delta
|
||||
message = response.choices[0].message
|
||||
choice = ChatCompletionStreamChoice(
|
||||
index=0,
|
||||
delta={"content": message.content},
|
||||
finish_reason="stop"
|
||||
)
|
||||
content = message.content or ""
|
||||
tool_calls = message.tool_calls or []
|
||||
thinking_content = getattr(response, '_thinking', None) # Get thinking if attached
|
||||
|
||||
stream_response = ChatCompletionStreamResponse(
|
||||
id=response.id,
|
||||
created=response.created,
|
||||
model=response.model,
|
||||
choices=[choice]
|
||||
)
|
||||
# CASE 1: Response has tool calls - need to stream thinking + tool_calls separately
|
||||
if tool_calls:
|
||||
# Step 1: Stream reasoning_content (thinking) if there's any thinking captured
|
||||
if thinking_content:
|
||||
# Send reasoning in chunks to simulate streaming (in real implementation this would be token-by-token)
|
||||
# For now, send as single reasoning block
|
||||
chunk = {
|
||||
"id": response.id,
|
||||
"object": "chat.completion.chunk",
|
||||
"created": response.created,
|
||||
"model": response.model,
|
||||
"choices": [{
|
||||
"index": 0,
|
||||
"delta": {
|
||||
"reasoning_content": thinking_content
|
||||
},
|
||||
"finish_reason": None
|
||||
}]
|
||||
}
|
||||
yield f"data: {json.dumps(chunk)}\n\n"
|
||||
|
||||
# Send as SSE event
|
||||
data = stream_response.model_dump_json(exclude_none=True)
|
||||
logger.debug(f"Streaming SSE data (delta format): {len(data)} chars")
|
||||
# Step 2: Emit tool_calls in the format OpenCode expects
|
||||
for i, tc in enumerate(tool_calls):
|
||||
# First chunk: tool_calls with empty arguments (just structure)
|
||||
tc_id = tc.id
|
||||
tc_name = tc.function.get("name", "")
|
||||
|
||||
chunk1 = {
|
||||
"id": response.id,
|
||||
"object": "chat.completion.chunk",
|
||||
"created": response.created,
|
||||
"model": response.model,
|
||||
"choices": [{
|
||||
"index": 0,
|
||||
"delta": {
|
||||
"tool_calls": [{
|
||||
"index": i,
|
||||
"id": tc_id,
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": tc_name,
|
||||
"arguments": ""
|
||||
}
|
||||
}]
|
||||
},
|
||||
"finish_reason": None
|
||||
}]
|
||||
}
|
||||
yield f"data: {json.dumps(chunk1)}\n\n"
|
||||
|
||||
yield f"data: {data}\n\n"
|
||||
# Second chunk: arguments content (if any)
|
||||
args = tc.function.get("arguments", "")
|
||||
if args:
|
||||
chunk2 = {
|
||||
"id": response.id,
|
||||
"object": "chat.completion.chunk",
|
||||
"created": response.created,
|
||||
"model": response.model,
|
||||
"choices": [{
|
||||
"index": 0,
|
||||
"delta": {
|
||||
"tool_calls": [{
|
||||
"index": i,
|
||||
"function": {
|
||||
"arguments": args
|
||||
}
|
||||
}]
|
||||
},
|
||||
"finish_reason": None
|
||||
}]
|
||||
}
|
||||
yield f"data: {json.dumps(chunk2)}\n\n"
|
||||
|
||||
# Send done event
|
||||
# Step 3: Final chunk with finish_reason="tool_calls"
|
||||
final_chunk = {
|
||||
"id": response.id,
|
||||
"object": "chat.completion.chunk",
|
||||
"created": response.created,
|
||||
"model": response.model,
|
||||
"choices": [{
|
||||
"index": 0,
|
||||
"delta": {},
|
||||
"finish_reason": "tool_calls"
|
||||
}]
|
||||
}
|
||||
yield f"data: {json.dumps(final_chunk)}\n\n"
|
||||
yield "data: [DONE]\n\n"
|
||||
return
|
||||
|
||||
# CASE 2: Pure text response (no tools) - stream as content
|
||||
# This is the final answer after tool execution or a simple response
|
||||
chunk = {
|
||||
"id": response.id,
|
||||
"object": "chat.completion.chunk",
|
||||
"created": response.created,
|
||||
"model": response.model,
|
||||
"choices": [{
|
||||
"index": 0,
|
||||
"delta": {
|
||||
"content": content
|
||||
},
|
||||
"finish_reason": "stop"
|
||||
}]
|
||||
}
|
||||
yield f"data: {json.dumps(chunk)}\n\n"
|
||||
yield "data: [DONE]\n\n"
|
||||
|
||||
logger.debug(f"Streaming complete")
|
||||
|
||||
|
||||
@router.post("/v1/chat/completions")
|
||||
async def chat_completions(request: ChatCompletionRequest, fastapi_request: Request):
|
||||
@@ -325,3 +412,114 @@ async def chat_completions(request: ChatCompletionRequest, fastapi_request: Requ
|
||||
logger.error(f"Error type: {type(e).__name__}")
|
||||
logger.error(f"Error message: {str(e)}")
|
||||
raise HTTPException(status_code=500, detail=f"Generation failed: {str(e)}")
|
||||
|
||||
|
||||
# Federation endpoint for peer-to-peer generation
|
||||
@router.post("/v1/federation/vote")
|
||||
async def federation_vote(request: Request):
|
||||
"""Handle federation vote request from a peer swarm.
|
||||
|
||||
This endpoint allows peer swarms to request generation from this swarm
|
||||
as part of the federation consensus process.
|
||||
|
||||
IMPORTANT: Peer nodes should NOT execute tools. They only provide text
|
||||
responses. The head node handles all tool execution after consensus.
|
||||
"""
|
||||
try:
|
||||
data = await request.json()
|
||||
prompt = data.get("prompt", "")
|
||||
max_tokens = data.get("max_tokens", 1024)
|
||||
temperature = data.get("temperature", 0.7)
|
||||
|
||||
logger.info(f"🗳️ Federation vote request from {request.client.host}")
|
||||
logger.debug(f" Prompt: {prompt[:100]}...")
|
||||
|
||||
# Get swarm manager from app state
|
||||
swarm_manager = getattr(request.app.state, 'swarm_manager', None)
|
||||
|
||||
if not swarm_manager:
|
||||
raise HTTPException(status_code=503, detail="Swarm not ready")
|
||||
|
||||
# Strip tool instructions from prompt for peer generation
|
||||
# Peers should only generate text - head node handles tools
|
||||
# Look for system message with tool instructions and remove it
|
||||
clean_prompt = _strip_tool_instructions(prompt)
|
||||
|
||||
# Generate response (text only, no tools)
|
||||
start_time = time.time()
|
||||
result = await swarm_manager.generate(
|
||||
prompt=clean_prompt,
|
||||
max_tokens=max_tokens,
|
||||
temperature=temperature,
|
||||
use_consensus=True
|
||||
)
|
||||
|
||||
elapsed_ms = (time.time() - start_time) * 1000
|
||||
response = result.selected_response
|
||||
|
||||
logger.info(f"✅ Federation vote complete ({response.tokens_generated} tokens, {elapsed_ms:.0f}ms)")
|
||||
|
||||
# Use actual confidence from consensus result instead of hardcoded value
|
||||
# This ensures fair comparison between local and peer swarms
|
||||
actual_confidence = result.confidence if hasattr(result, 'confidence') else 0.8
|
||||
|
||||
return {
|
||||
"response": response.text,
|
||||
"confidence": actual_confidence,
|
||||
"latency_ms": elapsed_ms,
|
||||
"worker_count": len(swarm_manager.workers) if hasattr(swarm_manager, 'workers') else 1,
|
||||
"tokens_per_second": response.tokens_per_second,
|
||||
"tokens_generated": response.tokens_generated
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
logger.exception("Error handling federation vote")
|
||||
raise HTTPException(status_code=500, detail=f"Federation vote failed: {str(e)}")
|
||||
|
||||
|
||||
def _strip_tool_instructions(prompt: str) -> str:
|
||||
"""Strip tool instructions from prompt for peer generation.
|
||||
|
||||
Peers should not generate tool calls - only the head node handles tools.
|
||||
This removes the system message containing tool instructions.
|
||||
|
||||
Args:
|
||||
prompt: Original prompt with potential tool instructions
|
||||
|
||||
Returns:
|
||||
Clean prompt without tool instructions
|
||||
"""
|
||||
# Look for common tool instruction patterns
|
||||
# Pattern 1: System message with "AVAILABLE TOOLS"
|
||||
if "AVAILABLE TOOLS" in prompt or "You have access to tools" in prompt:
|
||||
# Split by message boundaries and filter out system tool messages
|
||||
lines = prompt.split('\n')
|
||||
filtered_lines = []
|
||||
skip_until_next_role = False
|
||||
|
||||
for line in lines:
|
||||
# Check if this is a system message start with tool instructions
|
||||
if ('<|im_start|>system' in line or line.strip() == 'system:') and not skip_until_next_role:
|
||||
# Check if next few lines contain tool instructions
|
||||
# We'll collect lines and check
|
||||
filtered_lines.append(line)
|
||||
skip_until_next_role = True
|
||||
continue
|
||||
|
||||
if skip_until_next_role:
|
||||
# Check for end of system message
|
||||
if '<|im_end|>' in line or (line.strip().startswith('<|im_start|>') and 'system' not in line):
|
||||
skip_until_next_role = False
|
||||
filtered_lines.append(line)
|
||||
# Check if this line contains tool instruction markers
|
||||
elif any(marker in line for marker in ['AVAILABLE TOOLS', 'TOOL:', 'ARGUMENTS:', 'You have access to tools']):
|
||||
# Skip this line - it's part of tool instructions
|
||||
continue
|
||||
else:
|
||||
filtered_lines.append(line)
|
||||
else:
|
||||
filtered_lines.append(line)
|
||||
|
||||
return '\n'.join(filtered_lines)
|
||||
|
||||
return prompt
|
||||
|
||||
+2
-1
@@ -44,8 +44,9 @@ class APIServer:
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: FastAPI):
|
||||
"""Lifespan context manager for startup/shutdown."""
|
||||
# Startup: Set swarm manager in routes
|
||||
# Startup: Set swarm manager in routes and app state
|
||||
set_swarm_manager(self.swarm_manager)
|
||||
app.state.swarm_manager = self.swarm_manager # For federation endpoint
|
||||
# Set tool mode in routes
|
||||
from api.routes import set_use_opencode_tools
|
||||
set_use_opencode_tools(self.use_opencode_tools)
|
||||
|
||||
@@ -0,0 +1,97 @@
|
||||
"""Chatlog for debugging tool execution.
|
||||
|
||||
Writes a human-readable markdown log of tool calls and results.
|
||||
Enabled by setting LOCAL_SWARM_CHATLOG=1 environment variable.
|
||||
Log file defaults to 'chatlog.md' in the current working directory.
|
||||
"""
|
||||
|
||||
import os
|
||||
import json
|
||||
from datetime import datetime
|
||||
from typing import Optional
|
||||
|
||||
|
||||
class ChatLogger:
|
||||
"""Logs chat interactions and tool execution in opencode-style format."""
|
||||
|
||||
def __init__(self, log_path: Optional[str] = None):
|
||||
self.log_path = log_path or os.getenv('LOCAL_SWARM_CHATLOG_PATH', 'chatlog.md')
|
||||
self.enabled = os.getenv('LOCAL_SWARM_CHATLOG', '0') == '1'
|
||||
if self.enabled:
|
||||
self._initialize_log()
|
||||
|
||||
def _initialize_log(self):
|
||||
"""Create log file with header if it doesn't exist."""
|
||||
dir_path = os.path.dirname(self.log_path) or '.'
|
||||
os.makedirs(dir_path, exist_ok=True)
|
||||
with open(self.log_path, 'a') as f:
|
||||
f.write(f"\n\n# Local Swarm Session - {datetime.now().isoformat()}\n\n")
|
||||
|
||||
def _timestamp(self) -> str:
|
||||
"""Get current timestamp."""
|
||||
return datetime.now().strftime("%H:%M:%S")
|
||||
|
||||
def log_user_message(self, content: str):
|
||||
"""Log a user message."""
|
||||
if not self.enabled:
|
||||
return
|
||||
with open(self.log_path, 'a') as f:
|
||||
f.write(f"\n## [{self._timestamp()}] User\n\n")
|
||||
f.write(f"{content}\n\n")
|
||||
|
||||
def log_assistant_message(self, content: str, has_tool_calls: bool = False):
|
||||
"""Log an assistant response."""
|
||||
if not self.enabled:
|
||||
return
|
||||
with open(self.log_path, 'a') as f:
|
||||
f.write(f"\n## [{self._timestamp()}] Assistant\n\n")
|
||||
if has_tool_calls:
|
||||
# Use thinking block for messages that contain tool calls
|
||||
f.write(f"```thinking\n{content}\n```\n")
|
||||
else:
|
||||
f.write(f"{content}\n\n")
|
||||
|
||||
def log_tool_call(self, tool_name: str, arguments: dict, call_index: int = 1):
|
||||
"""Log a tool execution request."""
|
||||
if not self.enabled:
|
||||
return
|
||||
with open(self.log_path, 'a') as f:
|
||||
f.write(f"\n## [{self._timestamp()}] Tool Call #{call_index}\n\n")
|
||||
f.write(f"**Tool:** `{tool_name}`\n\n")
|
||||
f.write(f"**Arguments:**\n")
|
||||
try:
|
||||
args_json = json.dumps(arguments, indent=2)
|
||||
except Exception:
|
||||
args_json = str(arguments)
|
||||
f.write(f"```json\n{args_json}\n```\n")
|
||||
|
||||
def log_tool_result(self, tool_name: str, result: str, call_index: int = 1, success: bool = True):
|
||||
"""Log a tool execution result."""
|
||||
if not self.enabled:
|
||||
return
|
||||
with open(self.log_path, 'a') as f:
|
||||
f.write(f"\n## [{self._timestamp()}] Tool Result #{call_index}\n\n")
|
||||
status = "✓ Success" if success else "✗ Failed"
|
||||
f.write(f"**Tool:** `{tool_name}` - {status}\n\n")
|
||||
f.write(f"**Output:**\n")
|
||||
f.write(f"```\n{result}\n```\n")
|
||||
|
||||
def log_system(self, message: str):
|
||||
"""Log a system message."""
|
||||
if not self.enabled:
|
||||
return
|
||||
with open(self.log_path, 'a') as f:
|
||||
f.write(f"\n## [{self._timestamp()}] System\n\n")
|
||||
f.write(f"> {message}\n\n")
|
||||
|
||||
|
||||
# Global logger instance (lazy initialization handled per request)
|
||||
_global_logger: Optional[ChatLogger] = None
|
||||
|
||||
|
||||
def get_chat_logger() -> ChatLogger:
|
||||
"""Get the global chat logger instance (creates one if needed)."""
|
||||
global _global_logger
|
||||
if _global_logger is None:
|
||||
_global_logger = ChatLogger()
|
||||
return _global_logger
|
||||
+27
-26
@@ -351,34 +351,35 @@ class FederatedSwarm:
|
||||
for vote in peer_votes:
|
||||
all_votes.append((vote.response_text, vote.confidence, vote.peer_name))
|
||||
|
||||
if self.consensus_strategy == "best_of_n":
|
||||
# Use the consensus engine to pick the best response
|
||||
from swarm.consensus import ConsensusEngine
|
||||
# Always use quality-based selection - the head node judges ALL responses
|
||||
# This prevents nodes from being overconfident about their own mediocre answers
|
||||
from swarm.consensus import ConsensusEngine, GenerationResponse
|
||||
|
||||
responses = [
|
||||
GenerationResponse(
|
||||
text=text,
|
||||
tokens_generated=0,
|
||||
tokens_per_second=0,
|
||||
latency_ms=0,
|
||||
backend_name=source
|
||||
)
|
||||
for text, _, source in all_votes
|
||||
]
|
||||
responses = [
|
||||
GenerationResponse(
|
||||
text=text,
|
||||
tokens_generated=0,
|
||||
tokens_per_second=0,
|
||||
latency_ms=0,
|
||||
backend_name=source
|
||||
)
|
||||
for text, _, source in all_votes
|
||||
]
|
||||
|
||||
# Use synchronous quality scoring (no embeddings needed)
|
||||
engine = ConsensusEngine(strategy="quality")
|
||||
# _quality_vote is async but only uses sync scoring, so we
|
||||
# use the simpler _fastest_vote-style approach here
|
||||
scores = [engine._quality_score(r) for r in responses]
|
||||
best_idx = max(range(len(scores)), key=lambda i: scores[i])
|
||||
best = all_votes[best_idx]
|
||||
print(f" ✓ Selected response from {best[2]} (quality score: {scores[best_idx]:.2f})")
|
||||
return best[0], best[2]
|
||||
|
||||
# Default: weighted selection - pick highest confidence
|
||||
best = max(all_votes, key=lambda x: x[1])
|
||||
print(f" ✓ Selected response from {best[2]} (confidence: {best[1]:.2f})")
|
||||
# Use quality scoring to objectively compare all responses
|
||||
engine = ConsensusEngine(strategy="quality")
|
||||
scores = [engine._quality_score(r) for r in responses]
|
||||
|
||||
# Find best response based on actual quality, not self-reported confidence
|
||||
best_idx = max(range(len(scores)), key=lambda i: scores[i])
|
||||
best = all_votes[best_idx]
|
||||
|
||||
# Show comparison
|
||||
print(f" 📊 Quality scores:")
|
||||
for i, (text, conf, source) in enumerate(all_votes):
|
||||
print(f" {source}: {scores[i]:.2f} (self-reported: {conf:.2f})")
|
||||
|
||||
print(f" ✓ Selected response from {best[2]} (quality score: {scores[best_idx]:.2f})")
|
||||
return best[0], best[2]
|
||||
|
||||
async def get_federation_status(self) -> Dict[str, Any]:
|
||||
|
||||
+37
-16
@@ -121,6 +121,13 @@ class ToolExecutor:
|
||||
if not file_path:
|
||||
return "Error: filePath required"
|
||||
|
||||
# Check if original path was absolute or used ~ before expansion
|
||||
original_was_absolute = os.path.isabs(file_path) or file_path.startswith("~")
|
||||
|
||||
# Expand ~ to home directory
|
||||
file_path = os.path.expanduser(file_path)
|
||||
working_dir = os.path.expanduser(working_dir)
|
||||
|
||||
# Security: Prevent directory traversal
|
||||
file_path = os.path.normpath(file_path)
|
||||
if file_path.startswith("..") or file_path.startswith("/.."):
|
||||
@@ -132,14 +139,16 @@ class ToolExecutor:
|
||||
else:
|
||||
full_path = file_path
|
||||
|
||||
# Additional security: ensure resolved path is within working_dir
|
||||
try:
|
||||
real_working_dir = os.path.realpath(working_dir)
|
||||
real_full_path = os.path.realpath(full_path)
|
||||
if not real_full_path.startswith(real_working_dir):
|
||||
return f"Error: Access denied - path outside working directory"
|
||||
except Exception:
|
||||
pass # If realpath fails, continue anyway
|
||||
# Additional security: only enforce working_dir restriction for relative paths
|
||||
# If user explicitly specified an absolute path or ~ path, allow it
|
||||
if not original_was_absolute:
|
||||
try:
|
||||
real_working_dir = os.path.realpath(working_dir)
|
||||
real_full_path = os.path.realpath(full_path)
|
||||
if not real_full_path.startswith(real_working_dir):
|
||||
return f"Error: Access denied - path outside working directory"
|
||||
except Exception:
|
||||
pass # If realpath fails, continue anyway
|
||||
|
||||
logger.debug(f" 📁 Reading: {file_path}")
|
||||
logger.debug(f" 📍 Working dir: {working_dir}")
|
||||
@@ -163,6 +172,13 @@ class ToolExecutor:
|
||||
if not file_path:
|
||||
return "Error: filePath required"
|
||||
|
||||
# Check if original path was absolute or used ~ before expansion
|
||||
original_was_absolute = os.path.isabs(file_path) or file_path.startswith("~")
|
||||
|
||||
# Expand ~ to home directory
|
||||
file_path = os.path.expanduser(file_path)
|
||||
working_dir = os.path.expanduser(working_dir)
|
||||
|
||||
# Security: Prevent directory traversal
|
||||
file_path = os.path.normpath(file_path)
|
||||
if file_path.startswith("..") or file_path.startswith("/.."):
|
||||
@@ -174,14 +190,16 @@ class ToolExecutor:
|
||||
else:
|
||||
full_path = file_path
|
||||
|
||||
# Additional security: ensure resolved path is within working_dir
|
||||
try:
|
||||
real_working_dir = os.path.realpath(working_dir)
|
||||
real_full_path = os.path.realpath(full_path)
|
||||
if not real_full_path.startswith(real_working_dir):
|
||||
return f"Error: Access denied - path outside working directory"
|
||||
except Exception:
|
||||
pass # If realpath fails, continue anyway
|
||||
# Additional security: only enforce working_dir restriction for relative paths
|
||||
# If user explicitly specified an absolute path or ~ path, allow it
|
||||
if not original_was_absolute:
|
||||
try:
|
||||
real_working_dir = os.path.realpath(working_dir)
|
||||
real_full_path = os.path.realpath(full_path)
|
||||
if not real_full_path.startswith(real_working_dir):
|
||||
return f"Error: Access denied - path outside working directory"
|
||||
except Exception:
|
||||
pass # If realpath fails, continue anyway
|
||||
|
||||
logger.debug(f" 📁 Writing: {file_path}")
|
||||
logger.debug(f" 📍 Working dir: {working_dir}")
|
||||
@@ -208,6 +226,9 @@ class ToolExecutor:
|
||||
if not command:
|
||||
return "Error: command required"
|
||||
|
||||
# Expand ~ to home directory in cwd
|
||||
cwd = os.path.expanduser(cwd)
|
||||
|
||||
# Security: Block dangerous commands
|
||||
dangerous = ["rm -rf /", "> /dev", "mkfs", "dd if=/dev/zero", ":(){ :|:& };:"]
|
||||
for d in dangerous:
|
||||
|
||||
Reference in New Issue
Block a user