Qwen3.5 model implementation #6

Closed
opened 2026-05-10 12:01:31 +02:00 by sleepy · 1 comment
Owner

Implement the Qwen3.5 model family:

  • src/models/qwen3_5/model.zig — Model struct, load(), forward()
  • src/models/qwen3_5/config.zig — Hyperparameters (comptime where possible)
  • src/models/qwen3_5/attention.zig — GQA dispatch to attention.metal
  • src/models/qwen3_5/mlp.zig — SwiGLU dispatch to swiglu.metal
  • src/models/qwen3_5/tokenizer.zig — BPE from tokenizer.json
  • src/models/qwen3_5/mtp.zig — MTP head dispatch
  • src/models/qwen3_5/README.md
  • src/models/registry.zig — Model family registration hook

Acceptance criteria:

  • Model can load Qwen3.5-4B weights
  • Forward pass runs end-to-end
  • Tokenizer produces correct token IDs
  • MTP head works (if weights present)
  • Unit tests pass
  • No file exceeds ~400 lines

Max 2 attempts.

Implement the Qwen3.5 model family: - src/models/qwen3_5/model.zig — Model struct, load(), forward() - src/models/qwen3_5/config.zig — Hyperparameters (comptime where possible) - src/models/qwen3_5/attention.zig — GQA dispatch to attention.metal - src/models/qwen3_5/mlp.zig — SwiGLU dispatch to swiglu.metal - src/models/qwen3_5/tokenizer.zig — BPE from tokenizer.json - src/models/qwen3_5/mtp.zig — MTP head dispatch - src/models/qwen3_5/README.md - src/models/registry.zig — Model family registration hook Acceptance criteria: - Model can load Qwen3.5-4B weights - Forward pass runs end-to-end - Tokenizer produces correct token IDs - MTP head works (if weights present) - Unit tests pass - No file exceeds ~400 lines Max 2 attempts.
Author
Owner

Merged via squash in PR #14.

  • Config parsing with comptime assertions
  • BPE tokenizer with encode/decode
  • GQA attention layer with causal masking
  • SwiGLU MLP layer
  • Full model with embedding, stacked layers, forward pass
  • Model registry with factory pattern
  • 54/54 tests pass
  • zig build , zig build test , zig build lint
Merged via squash in PR #14. - Config parsing with comptime assertions - BPE tokenizer with encode/decode - GQA attention layer with causal masking - SwiGLU MLP layer - Full model with embedding, stacked layers, forward pass - Model registry with factory pattern - 54/54 tests pass - `zig build` ✅, `zig build test` ✅, `zig build lint` ✅
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/sleepy-llm#6
No description provided.