bug: generate command produces garbage output on Qwen3.5-0.8B #57

Closed
opened 2026-05-20 23:58:25 +02:00 by sleepy · 0 comments
Owner

Running sleepy-llm generate with Qwen3.5-0.8B produces garbled Unicode tokens instead of coherent text.

Reproduction

mkdir -p ~/.sleepy-llm/models/Qwen3.5-0.8B
ln -sf ~/.cache/huggingface/hub/models--Qwen--Qwen3.5-0.8B/snapshots/*/config.json ~/.sleepy-llm/models/Qwen3.5-0.8B/
ln -sf ~/.cache/huggingface/hub/models--Qwen--Qwen3.5-0.8B/snapshots/*/model.safetensors-* ~/.sleepy-llm/models/Qwen3.5-0.8B/
ln -sf ~/.cache/huggingface/hub/models--Qwen--Qwen3.5-0.8B/snapshots/*/tokenizer.json ~/.sleepy-llm/models/Qwen3.5-0.8B/
./zig-out/bin/sleepy-llm generate --model ~/.sleepy-llm/models/Qwen3.5-0.8B --prompt "Hello" --max-tokens 5

Result: åĴĮåĴĮåĴĮåĴĮåĴĮ (garbage)

Expected: Coherent English text like "Hello! How can I help..."

Notes

  • The model weights load without errors (no safetensors parse failures)
  • Tokenizer appears to work (no encode/decode errors)
  • Could be: incorrect weight format parsing, wrong dtype conversion, Metal kernel producing incorrect results, or tokenizer vocab mismatch
  • Need to verify against reference mlx-lm output with same seed
  • Issue #55 (bench command also broken)
  • Correctness tests in src/tests/correctness.zig also hardcoded to 4B model
Running `sleepy-llm generate` with Qwen3.5-0.8B produces garbled Unicode tokens instead of coherent text. ## Reproduction ```bash mkdir -p ~/.sleepy-llm/models/Qwen3.5-0.8B ln -sf ~/.cache/huggingface/hub/models--Qwen--Qwen3.5-0.8B/snapshots/*/config.json ~/.sleepy-llm/models/Qwen3.5-0.8B/ ln -sf ~/.cache/huggingface/hub/models--Qwen--Qwen3.5-0.8B/snapshots/*/model.safetensors-* ~/.sleepy-llm/models/Qwen3.5-0.8B/ ln -sf ~/.cache/huggingface/hub/models--Qwen--Qwen3.5-0.8B/snapshots/*/tokenizer.json ~/.sleepy-llm/models/Qwen3.5-0.8B/ ./zig-out/bin/sleepy-llm generate --model ~/.sleepy-llm/models/Qwen3.5-0.8B --prompt "Hello" --max-tokens 5 ``` Result: `åĴĮåĴĮåĴĮåĴĮåĴĮ` (garbage) Expected: Coherent English text like "Hello! How can I help..." ## Notes - The model weights load without errors (no safetensors parse failures) - Tokenizer appears to work (no encode/decode errors) - Could be: incorrect weight format parsing, wrong dtype conversion, Metal kernel producing incorrect results, or tokenizer vocab mismatch - Need to verify against reference mlx-lm output with same seed ## Related - Issue #55 (bench command also broken) - Correctness tests in `src/tests/correctness.zig` also hardcoded to 4B model
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/sleepy-llm#57
No description provided.