mid_model_research/forgecode/feedback/localllm/qwen-3.5.md

# Qwen Models with ForgeCode - Feedback Report

**Models Covered:** Qwen 3.5, Qwen3
**Provider:** Alibaba Cloud (via local inference)
**Harness:** ForgeCode
**Source References:** GitHub Issue #2894, Reddit r/LocalLLaMA
**Date Compiled:** April 9, 2026

---

## Model Reference Guide

| Model Family | Available Sizes | Notes |
|--------------|-----------------|-------|
| **Qwen 3.5** | 0.8B, 2B, 4B, 9B (dense); 27B, 122B-A10B, 397B-A17B (MoE) | Released Feb 2026 |
| **Qwen3** | 0.6B, 1.7B, 4B, 8B, 14B, 32B (dense); 30B-A3B, 235B-A22B (MoE) | Released April 2025 |
| **Qwen2.5** | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B + Coder variants | Earlier generation |

> **Note:** References to "Qwen 3.5 14B" in community discussions likely mean Qwen3-14B or Qwen2.5-14B.

---

## Known Issues

### Multiple System Messages Bug
**GitHub Issue:** #2894 (Open as of April 8, 2026)

**Problem:** Multiple system messages break models with strict chat templates (e.g., Qwen3, Qwen 3.5)

**Error Manifestation:**
- Models with strict chat templates fail to parse message structure correctly
- Tool calling may fail or produce incorrect results
- Agent behavior becomes unpredictable

**Impact:**
- Affects local inference with llama.cpp, Ollama, and similar servers
- Qwen3 and Qwen 3.5 specifically mentioned as affected

**Workaround Status:** No official fix yet; issue under investigation

---

## Tool Calling with Qwen Models

### General Observations from Community

1. **Qwen3-Coder Next** shows promise as "first usable coding model < 60GB"
2. **Tool calling reliability varies** by inference backend:
   - LM Studio 0.4.9 reportedly handles Qwen3.5 XML tool parsing more reliably than raw llama.cpp
   - llama.cpp with `--jinja` flag helps with tool calling

3. **finish_reason issue** is annoying to debug according to community reports

---

## Recommendations for Local Use

1. **Use LM Studio** for more reliable tool parsing vs raw llama.cpp
2. **Monitor system message count** - known issue with ForgeCode's multi-message approach
3. **Test thoroughly** before relying on Qwen 3.5 for production tasks via ForgeCode

---

## Source References

1. **GitHub Issue:** https://github.com/antinomyhq/forgecode/issues/2894
2. **Reddit r/LocalLLaMA:** https://www.reddit.com/r/LocalLLaMA/comments/1sdhvc5/qwen_35_tool_calling_fixes_for_agentic_use_whats/