Decode produces repetitive output after first token #32
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
GPU-only decode path (branch
perf/gpu-only-rewrite) produces correct first token but degenerates into repetitive output (. . ....).Debug output shows tokens oscillating between IDs 13 (",") and 220 ("."), then stuck on 13.
Prefill produces correct first token (4858 = " everyone"). Second decode token (13 = ",") is also correct per reference. Third decode token (220 = ".") is wrong — should be " I" or similar.
Likely cause
K/V written by first decode step corrupts the cache for subsequent steps. Possible:
Acceptance
./zig-out/bin/sleepy-llm generate --model ~/.sleepy-llm/models/Qwen3.5-4B --prompt "Hello" --max-tokens 16produces coherent English matching referenceMax 2 attempts.
Fixed by
72d908e. Root cause: residual_buf was shared across layers — MLP output was silently dropped because the residual was updated toinput + attentionbut never included+ mlp. Added copy after final residual_add per layer. Decode now produces coherent multi-token output.