PackedDataset.__iter__ silently drops items when buffer < batch_size * 4 #13
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
PackedDataset.__iter__()(data.py:104-112) only yields items whenlen(buffer) >= batch_size * 4:If the final buffer is smaller than
batch_size * 4, those items are silently dropped. For streaming datasets that end, the last few sequences could be lost.Impact
Action needed
Files
tergent/data.py:104-112