compress_token_ids uses Python hash() — non-deterministic across runs #9
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
compress_token_ids()(engram.py:31) uses Python's built-inhash()to map token strings to compressed IDs:Python's
hash()is randomized per process (PYTHONHASHSEED). The cache will produce different results on each training run, making Engram lookups non-reproducible.Impact
Action needed
Replace with a deterministic hash function. The file already has
_murmur_hash()— use that instead, or switch tohashlib.md5.Files
tergent/engram.py:31