EngramModule has double residual — one in Engram, one in TransformerBlock #16

Open
opened 2026-05-09 19:22:08 +02:00 by sleepy · 0 comments
Owner

Problem

EngramModule.forward() (engram.py:156) returns h + y (a residual connection). Then in model.py:130-133, the TransformerBlock also adds a residual:

# model.py: Engram injection
s = self.engrams[engram_key](s, compressed_ids)  # returns h + y

# model.py: block forward
attn_out, kv = self.attn(self.norm1(h_in), mask, past_kv)
x = x + attn_out  # second residual

So the Engram output gets two residual additions — one from Engram itself and one from TransformerBlock. The input h bypasses both the Engram processing and the block's attention/FFN.

Impact

  • Not necessarily a bug, but architecturally unusual
  • May affect gradient flow — the signal from h reaches the next layer through two independent paths (Engram residual + block residual)
  • Could lead to different optimization dynamics than intended

Action needed

Decide on intent:

  • If Engram is meant to be a preprocessing step: remove its internal residual (return y instead of h + y)
  • If Engram is meant to be a residual module: keep it, but document the double-residual design

Files

  • tergent/engram.py:156
  • tergent/model.py:130-133
## Problem `EngramModule.forward()` (engram.py:156) returns `h + y` (a residual connection). Then in `model.py:130-133`, the TransformerBlock also adds a residual: ```python # model.py: Engram injection s = self.engrams[engram_key](s, compressed_ids) # returns h + y # model.py: block forward attn_out, kv = self.attn(self.norm1(h_in), mask, past_kv) x = x + attn_out # second residual ``` So the Engram output gets **two** residual additions — one from Engram itself and one from TransformerBlock. The input `h` bypasses both the Engram processing and the block's attention/FFN. ## Impact - Not necessarily a bug, but architecturally unusual - May affect gradient flow — the signal from `h` reaches the next layer through two independent paths (Engram residual + block residual) - Could lead to different optimization dynamics than intended ## Action needed Decide on intent: - If Engram is meant to be a **preprocessing** step: remove its internal residual (`return y` instead of `h + y`) - If Engram is meant to be a **residual module**: keep it, but document the double-residual design ## Files - `tergent/engram.py:156` - `tergent/model.py:130-133`
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/ternary#16
No description provided.