Linear attention is (maybe) all you need (to understand transformer optimization) Paper • 2310.01082 • Published Oct 2, 2023 • 1