Initial commit: coding harness feedback analysis
Harnesses under analysis: - opencode (Go-based coding agent) - pi (minimal terminal coding harness by Mario Zechner) - hermes (Nous Research agent) - forgecode (AI pair programmer with sub-agents) Each harness folder contains: - repo/: Source code from respective repositories - feedback/localllm/: Community feedback for local/smaller models - feedback/frontier/: Community feedback for frontier models Research focus: Tool handling, skills systems, prompt engineering, context management, and best practices for smaller/local models.
This commit is contained in:
@@ -0,0 +1,55 @@
|
||||
# Qwen 3.5 with ForgeCode - Feedback Report
|
||||
|
||||
**Model:** Qwen 3.5
|
||||
**Provider:** Alibaba Cloud (via local inference)
|
||||
**Harness:** ForgeCode
|
||||
**Source References:** GitHub Issue #2894, Reddit r/LocalLLaMA
|
||||
**Date Compiled:** April 9, 2026
|
||||
|
||||
---
|
||||
|
||||
## Known Issues
|
||||
|
||||
### Multiple System Messages Bug
|
||||
**GitHub Issue:** #2894 (Open as of April 8, 2026)
|
||||
|
||||
**Problem:** Multiple system messages break models with strict chat templates (e.g., Qwen3.5)
|
||||
|
||||
**Error Manifestation:**
|
||||
- Models with strict chat templates fail to parse message structure correctly
|
||||
- Tool calling may fail or produce incorrect results
|
||||
- Agent behavior becomes unpredictable
|
||||
|
||||
**Impact:**
|
||||
- Affects local inference with llama.cpp, Ollama, and similar servers
|
||||
- Qwen3.5 specifically mentioned as affected
|
||||
|
||||
**Workaround Status:** No official fix yet; issue under investigation
|
||||
|
||||
---
|
||||
|
||||
## Tool Calling with Qwen Models
|
||||
|
||||
### General Observations from Community
|
||||
|
||||
1. **Qwen3-Coder Next** shows promise as "first usable coding model < 60GB"
|
||||
2. **Tool calling reliability varies** by inference backend:
|
||||
- LM Studio 0.4.9 reportedly handles Qwen3.5 XML tool parsing more reliably than raw llama.cpp
|
||||
- llama.cpp with `--jinja` flag helps with tool calling
|
||||
|
||||
3. **finish_reason issue** is annoying to debug according to community reports
|
||||
|
||||
---
|
||||
|
||||
## Recommendations for Local Use
|
||||
|
||||
1. **Use LM Studio** for more reliable tool parsing vs raw llama.cpp
|
||||
2. **Monitor system message count** - known issue with ForgeCode's multi-message approach
|
||||
3. **Test thoroughly** before relying on Qwen 3.5 for production tasks via ForgeCode
|
||||
|
||||
---
|
||||
|
||||
## Source References
|
||||
|
||||
1. **GitHub Issue:** https://github.com/antinomyhq/forgecode/issues/2894
|
||||
2. **Reddit r/LocalLLaMA:** https://www.reddit.com/r/LocalLLaMA/comments/1sdhvc5/qwen_35_tool_calling_fixes_for_agentic_use_whats/
|
||||
Reference in New Issue
Block a user