matlok 's Collections

Papers - Attention - Mixture of Attention Heads (MoA)

Generalized multi head using RoPE