Files
mid_model_research/FEEDBACK_TEMPLATE.md
sleepy e1781947f4 Fix Qwen3.5-35B-A3B model references
Reverted incorrect changes - Qwen3.5-35B-A3B IS a real model:
- 35B total / 3B active parameters (MoE)
- 262k native context (up to 1M extended)
- Apache 2.0 license
- Available on HuggingFace: Qwen/Qwen3.5-35B-A3B

Updated files:
- opencode/opencode/feedback/localllm/local-llm-feedback.md
- opencode/opencode/feedback/SUMMARY.md
- FEEDBACK_TEMPLATE.md

Added correct specs:
- MMLU-Pro: 85.3%
- SWE-bench Verified: 69.2%
- Context: 262k native, 1M extended
2026-04-09 16:25:19 +02:00

161 lines
3.2 KiB
Markdown

# Feedback File Structure Template
Use this structure for all feedback files to maintain consistency across the repository.
## Standard Header
```markdown
# [Model Name] with [Harness] - Feedback Report
**Model:** [Full model name]
**Size:** [Parameters, e.g., 27B, 30B-A3B MoE]
**Provider:** [Company/API, e.g., OpenAI, Anthropic, Ollama]
**Harness:** [Harness name, e.g., OpenCode, Hermes, ForgeCode, pi]
**Date Compiled:** [YYYY-MM-DD]
**Source References:** [Primary sources]
---
```
## Required Sections
### 1. Quick Reference
```markdown
## Quick Reference
| Attribute | Value |
|-----------|-------|
| Model | [Name] |
| Size | [Parameters] |
| Context Window | [e.g., 128K, 1M] |
| Best For | [Use case summary] |
| Cost | [If applicable] |
```
### 2. Benchmark Results
```markdown
## Benchmark Results
### [Benchmark Name]
- **Score:** [X%] (Rank #Y)
- **Harness:** [If Terminal-Bench or harness-specific]
- **Date:** [When tested]
- **Note:** [Any important context]
**Important:** For Terminal-Bench, always note that scores are harness+model combinations.
```
### 3. What Worked Well
```markdown
## What Worked Well
1. **[Key Point]**
- Detailed explanation
- Supporting evidence
2. **[Key Point]**
- Details
```
### 4. Issues Encountered
```markdown
## Issues Encountered
1. **[Issue Title]**
- **Severity:** [Critical/Major/Minor]
- **Description:** Details
- **Workaround:** If any
2. **[Issue Title]**
- Details
```
### 5. Configuration (Optional)
```markdown
## Configuration
```json
[Configuration example]
```
Or for CLI flags:
```bash
[Command line options]
```
```
### 6. Source References
```markdown
## Source References
1. **[Source Name]**: [URL]
- [Brief description of what it covers]
2. **[Source Name]**: [URL]
- Description
```
## For Multi-Model Files
If a file covers multiple models, use this structure:
```markdown
# [Topic] Feedback for [Harness]
**Date Compiled:** [YYYY-MM-DD]
**Source References:** [Primary sources]
---
## Model Reference Guide
| Model | Size | Provider | Notes |
|-------|------|----------|-------|
| [Name] | [Size] | [Provider] | [Key info] |
---
## [Model 1]
[Follow standard sections above]
---
## [Model 2]
[Follow standard sections above]
```
## Style Guidelines
1. **Use tables** for comparative data
2. **Use bullet points** for lists
3. **Use numbered lists** for sequential steps or ranked items
4. **Bold** key terms and metrics
5. **Italic** for emphasis
6. `Code formatting` for commands, file names, and technical terms
7. **Always cite sources** with full URLs
8. **Note dates** for time-sensitive information
## Special Notes
### Terminal-Bench
Always clarify that Terminal-Bench scores represent **harness+model** combinations, not raw model capability. Include the harness name in the benchmark table.
### Qwen Models
Include the Model Reference Guide when discussing Qwen models to avoid confusion between Qwen3, Qwen 3.5, and Qwen2.5 families.
Current Qwen 3.5 MoE models include: 27B, 35B-A3B, 122B-A10B, 397B-A17B.
### Verified vs Self-Reported
Note when benchmark scores are:
- **Verified:** Independently validated (e.g., SWE-bench Verified)
- **Self-Reported:** Submitted by the harness developers themselves