NGramHasher.lookup leaves early positions as index 0 — wrong embedding injected #10
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
NGramHasher.lookup()(engram.py:92-93) skips positions where there aren't enough tokens for an n-gram:All positions that can't form an n-gram get hash index 0. This means the same embedding slot is looked up for all "not enough context" positions, which injects a real embedding value (whatever happens to be at index 0) rather than a neutral/zero signal.
Impact
Action needed
indiceswith a special "no-context" value (e.g., -1), and haveEngramEmbeddinghandle it by returning zeros, ORFiles
tergent/engram.py:92-93