51123212c4
Harnesses under analysis: - opencode (Go-based coding agent) - pi (minimal terminal coding harness by Mario Zechner) - hermes (Nous Research agent) - forgecode (AI pair programmer with sub-agents) Each harness folder contains: - repo/: Source code from respective repositories - feedback/localllm/: Community feedback for local/smaller models - feedback/frontier/: Community feedback for frontier models Research focus: Tool handling, skills systems, prompt engineering, context management, and best practices for smaller/local models.
1.8 KiB
1.8 KiB
Qwen 3.5 with ForgeCode - Feedback Report
Model: Qwen 3.5
Provider: Alibaba Cloud (via local inference)
Harness: ForgeCode
Source References: GitHub Issue #2894, Reddit r/LocalLLaMA
Date Compiled: April 9, 2026
Known Issues
Multiple System Messages Bug
GitHub Issue: #2894 (Open as of April 8, 2026)
Problem: Multiple system messages break models with strict chat templates (e.g., Qwen3.5)
Error Manifestation:
- Models with strict chat templates fail to parse message structure correctly
- Tool calling may fail or produce incorrect results
- Agent behavior becomes unpredictable
Impact:
- Affects local inference with llama.cpp, Ollama, and similar servers
- Qwen3.5 specifically mentioned as affected
Workaround Status: No official fix yet; issue under investigation
Tool Calling with Qwen Models
General Observations from Community
-
Qwen3-Coder Next shows promise as "first usable coding model < 60GB"
-
Tool calling reliability varies by inference backend:
- LM Studio 0.4.9 reportedly handles Qwen3.5 XML tool parsing more reliably than raw llama.cpp
- llama.cpp with
--jinjaflag helps with tool calling
-
finish_reason issue is annoying to debug according to community reports
Recommendations for Local Use
- Use LM Studio for more reliable tool parsing vs raw llama.cpp
- Monitor system message count - known issue with ForgeCode's multi-message approach
- Test thoroughly before relying on Qwen 3.5 for production tasks via ForgeCode