is_full_attention flag is a no-op in TransformerBlock #7
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
TransformerBlock.is_full_attentionis stored but never used inforward()(attention.py:109-123). All layers run identical GQA attention regardless of this flag.Impact
The config says layers 0-15 should use "GatedDeltaNet V2" (per config.py comment), but that is never implemented. All 20 layers run the same attention, making the flag and the
full_attention_layersconfig meaningless.Action needed
full_attention_layersconfig if this is dead codeFiles
tergent/attention.py:109-123tergent/config.py:32(full_attention_layers)