diff --git a/FEEDBACK_TEMPLATE.md b/FEEDBACK_TEMPLATE.md index e12eefd..deb6ad2 100644 --- a/FEEDBACK_TEMPLATE.md +++ b/FEEDBACK_TEMPLATE.md @@ -152,6 +152,8 @@ Always clarify that Terminal-Bench scores represent **harness+model** combinatio ### Qwen Models Include the Model Reference Guide when discussing Qwen models to avoid confusion between Qwen3, Qwen 3.5, and Qwen2.5 families. +Current Qwen 3.5 MoE models include: 27B, 35B-A3B, 122B-A10B, 397B-A17B. + ### Verified vs Self-Reported Note when benchmark scores are: - **Verified:** Independently validated (e.g., SWE-bench Verified) diff --git a/opencode/opencode/feedback/SUMMARY.md b/opencode/opencode/feedback/SUMMARY.md index 5b48f2b..fab8d8d 100644 --- a/opencode/opencode/feedback/SUMMARY.md +++ b/opencode/opencode/feedback/SUMMARY.md @@ -16,14 +16,12 @@ This document provides a comprehensive summary of community feedback, benchmark | Rank | Model | Strengths | Best For | |------|-------|-----------|----------| -| 1 | **Qwen3-30B-A3B** | Best balance of speed, accuracy, context (128k) | General coding, long-context tasks | +| 1 | **Qwen3.5-35B-A3B** | Best balance of speed, accuracy, context (262k native, 1M extended) | General coding, long-context tasks | | 2 | **Gemma 4 26B-A4B** | Excellent on M-series Mac, 8W power usage | Laptop development, M5 MacBook | | 3 | **GLM-5.1** | SWE-Bench Pro #1 (58.4), 8-hour autonomy | Long-horizon tasks, enterprise | | 4 | **Nemotron 3 Super** | PinchBench 85.6%, 1M context | Agentic reasoning, GPU clusters | | 5 | **Gemma 4 8B** | Runs on 16GB RAM, fast | Quick tasks, modest hardware | -**Note:** "Qwen3.5-35B-A3B" community references likely mean **Qwen3-30B-A3B**. Qwen 3.5 MoE sizes: 27B, 122B-A10B, 397B-A17B. - ### 2. Best Frontier Models for OpenCode | Rank | Model | Strengths | Best For | @@ -96,7 +94,7 @@ This document provides a comprehensive summary of community feedback, benchmark **File:** `opencode/feedback/localllm/local-llm-feedback.md` **Contents:** -- Qwen3-30B-A3B (MoE) - Detailed performance data (Note: community "Qwen3.5-35B-A3B" references) +- Qwen3.5-35B-A3B (MoE) - Detailed performance data - Gemma 4 26B-A4B - M-series Mac optimization - GLM-4.7 Flash - API performance - GLM-5.1 - 8-hour autonomous capability @@ -237,7 +235,7 @@ This document provides a comprehensive summary of community feedback, benchmark ## Recommendations ### For Local Development -1. **Qwen3-30B-A3B** - Best overall local model (Note: community references to "Qwen3.5-35B-A3B") +1. **Qwen3.5-35B-A3B** - Best overall local model (35B/3B MoE, 262k context) 2. **Gemma 4 26B-A4B** - Best for M-series Mac 3. **Increase context to 32K+** 4. **Use corrected chat templates** @@ -280,7 +278,7 @@ This document provides a comprehensive summary of community feedback, benchmark The OpenCode ecosystem has matured significantly with strong support for both local and frontier models. Key findings: 1. **Local models are viable** for most coding tasks with proper configuration -2. **Qwen3-30B-A3B** (often referenced as "Qwen3.5-35B-A3B") is the best local model overall +2. **Qwen3.5-35B-A3B** is the best local model overall (35B/3B MoE, Apache 2.0) 3. **GLM-5.1** is the best frontier model (SWE-Bench Pro #1) 4. **Context management** is critical for long-running sessions 5. **Hybrid setups** offer the best of both worlds diff --git a/opencode/opencode/feedback/localllm/local-llm-feedback.md b/opencode/opencode/feedback/localllm/local-llm-feedback.md index c09689a..289056f 100644 --- a/opencode/opencode/feedback/localllm/local-llm-feedback.md +++ b/opencode/opencode/feedback/localllm/local-llm-feedback.md @@ -12,31 +12,34 @@ This document compiles community feedback, benchmark results, and performance ob | Model Family | Available Sizes | Type | Notes | |--------------|-----------------|------|-------| | **Qwen 3.5** | 0.8B, 2B, 4B, 9B | Dense | Released Feb 2026 | -| **Qwen 3.5** | 27B, 122B-A10B, 397B-A17B | MoE | Released Feb 2026 | +| **Qwen 3.5** | 27B, 35B-A3B, 122B-A10B, 397B-A17B | MoE | Released Feb 2026 | | **Qwen3** | 0.6B, 1.7B, 4B, 8B, 14B, 32B | Dense | Released April 2025 | | **Qwen3** | 30B-A3B, 235B-A22B | MoE | Released April 2025 | | **Qwen2.5** | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Dense | + Coder variants | -> **Note:** "Qwen3.5-35B-A3B" references in community posts likely mean **Qwen3-30B-A3B** (from the Qwen3 MoE family) or are speculative. Qwen 3.5 MoE sizes are 27B, 122B-A10B, and 397B-A17B. - --- -### Qwen3-30B-A3B (MoE) [Most likely model referenced] -**Model:** Qwen3-30B-A3B (not Qwen 3.5) -**Size:** 30B total / 3B active parameters -**Quantization:** Q4_K_M, Q8_0, UD-Q4_K_XL -**Provider:** llama.cpp / Ollama / HuggingFace +### Qwen3.5-35B-A3B (MoE) +**Model:** Qwen3.5-35B-A3B +**Size:** 35B total / 3B active parameters +**Quantization:** Q4_K_M, Q8_0, UD-Q4_K_XL, GPTQ-Int4 +**Provider:** llama.cpp / Ollama / vLLM / HuggingFace +**Context:** 262k native, up to 1M extended +**License:** Apache 2.0 **Benchmark Results:** - **Performance:** 3-5x faster than dense variants (~60-100 tok/s) -- **Context:** Supports up to 128k context +- **Context:** Supports up to 262k context (1M extended) +- **MMLU-Pro:** 85.3% +- **SWE-bench Verified:** 69.2% - **Accuracy:** Excellent on coding tasks, comparable to cloud models **What Worked Well:** -- Long context handling (128k tested) +- Long context handling (262k tested, 1M extended) - Fast inference due to MoE architecture - Good tool calling with corrected chat templates - Works well with OpenCode's skill system +- Apache 2.0 license (open source) **Issues Encountered:** - Default chat template breaks tool-calling in OpenCode @@ -52,7 +55,7 @@ This document compiles community feedback, benchmark results, and performance ob --batch-size 2048 --ubatch-size 512 --jinja ---chat-template-file qwen3-chat-template-corrected.jinja +--chat-template-file qwen35-chat-template-corrected.jinja --context-shift ``` @@ -324,14 +327,12 @@ docker model configure --context-size=100000 gpt-oss:20B-UD-Q8_K_XL ### Best Local Models for OpenCode (Ranked) -1. **Qwen3-30B-A3B** (or Qwen 3.5 27B-A3B if available) - Best balance of speed, accuracy, context +1. **Qwen3.5-35B-A3B** - Best overall balance of speed, accuracy, context (262k native, 1M extended) 2. **Gemma 4 26B-A4B** - Best for M-series Mac, very efficient 3. **GLM-5.1** - Best for long-horizon tasks (requires enterprise hardware) 4. **Nemotron 3 Super** - Best for agentic reasoning (enterprise hardware) 5. **Gemma 4 8B** - Best for quick tasks on modest hardware -**Note:** Community references to "Qwen3.5-35B-A3B" likely mean **Qwen3-30B-A3B** from the Qwen3 family (not Qwen 3.5). Qwen 3.5 MoE models come in 27B, 122B-A10B, and 397B-A17B sizes. - ### Hybrid Setup Strategy - **Local models:** Lightweight tasks, repetitive work, privacy-sensitive - **Cloud models:** Complex reasoning, multi-file refactors, deep analysis