train_300m.py smoke test has input_ids/labels length mismatch #17
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The smoke test's
TinyDataset(scripts/train_300m.py:107) yields:This is inconsistent with
data.pywhich usesinput_ids == labels(same tensor, same length), then relies on the loss function'slogits[:, :-1]vslabels[:, 1:]shift for teacher forcing.The smoke test's
seq[1:] + [0]creates a different alignment — the last position gets label0(BOS/pad) instead of being part of the natural sequence shift. This means the loss computes against a different target distribution than the main data pipeline.Impact
Action needed
Make the smoke test use the same
input_ids == labelspattern asdata.py:Files
scripts/train_300m.py:107