Metal kernels for GPU linear attention (conv1d, delta rule, norm/gate) #35
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Objective
Write 2 Metal compute kernels for linear attention decode (seq_len=1), eliminating all CPU state management.
Kernel 1:
conv1d_state_bf16Kernel 2:
linear_attn_delta_rule_bf16Files to create/modify
Acceptance
Constraints