[2605.02144] Projection-Free Transformers via Gaussian Kernel Attention
Abstract page for arXiv paper 2605.02144: Projection-Free Transformers via Gaussian Kernel Att... (QK^\top/\sqrt{d})V$, where $Q=XW_Q$, $K=XW_K$, and $V=XW_V$ are learned linear projecti...