is_full_attention flag is a no-op in TransformerBlock #7

New issue

Open

opened 2026-05-08 23:45:31 +02:00 by sleepy · 0 comments

sleepy commented

2026-05-08 23:45:31 +02:00

Owner

Problem

TransformerBlock.is_full_attention is stored but never used in forward() (attention.py:109-123). All layers run identical GQA attention regardless of this flag.

Impact

The config says layers 0-15 should use "GatedDeltaNet V2" (per config.py comment), but that is never implemented. All 20 layers run the same attention, making the flag and the full_attention_layers config meaningless.

Action needed

Implement GatedDeltaNet V2 as the alternative attention mechanism, OR
Remove the flag and full_attention_layers config if this is dead code

Files

tergent/attention.py:109-123
tergent/config.py:32 (full_attention_layers)

## Problem `TransformerBlock.is_full_attention` is stored but **never used** in `forward()` (attention.py:109-123). All layers run identical GQA attention regardless of this flag. ## Impact The config says layers 0-15 should use "GatedDeltaNet V2" (per config.py comment), but that is never implemented. All 20 layers run the same attention, making the flag and the `full_attention_layers` config meaningless. ## Action needed - Implement GatedDeltaNet V2 as the alternative attention mechanism, OR - Remove the flag and `full_attention_layers` config if this is dead code ## Files - `tergent/attention.py:109-123` - `tergent/config.py:32` (full_attention_layers)

No labels

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

sleepy/ternary#7

No description provided.

Rows
Columns