NGramHasher.lookup uses Python loops over B×T — massive bottleneck #8

Open
opened 2026-05-08 23:46:06 +02:00 by sleepy · 0 comments
Owner

Problem

NGramHasher.lookup() (engram.py:82-97) uses nested Python loops over batch, sequence, orders, and heads:

for b in range(B):
    for t in range(T):
        for oi, order in enumerate(self.n_orders):
            ...
            for h in range(self.n_heads):

For batch=4, seq=2048, 2 orders, 8 heads → 131,072 Python loop iterations per call. This runs on CPU (.cpu().numpy()) and will dominate training time.

Impact

Every training step calls this for every Engram layer. At seq=2048, this will be the bottleneck, potentially making training 10-100× slower than necessary.

Action needed

  • Vectorize the n-gram hashing using PyTorch tensor operations, OR
  • Move to a CUDA kernel / TorchScript implementation
  • Consider pre-computing n-gram tuples on GPU and hashing in batch

Files

  • tergent/engram.py:82-97
## Problem `NGramHasher.lookup()` (engram.py:82-97) uses nested Python loops over batch, sequence, orders, and heads: ```python for b in range(B): for t in range(T): for oi, order in enumerate(self.n_orders): ... for h in range(self.n_heads): ``` For batch=4, seq=2048, 2 orders, 8 heads → **131,072 Python loop iterations** per call. This runs on CPU (`.cpu().numpy()`) and will dominate training time. ## Impact Every training step calls this for every Engram layer. At seq=2048, this will be the bottleneck, potentially making training 10-100× slower than necessary. ## Action needed - Vectorize the n-gram hashing using PyTorch tensor operations, OR - Move to a CUDA kernel / TorchScript implementation - Consider pre-computing n-gram tuples on GPU and hashing in batch ## Files - `tergent/engram.py:82-97`
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/ternary#8
No description provided.