Tony Ma's picture

2 4 8

Tony Ma

BlueMatrix

·

AI & ML interests

Bayesian Deep Learning, Time Series Modeling, Foundation Models

Organizations

None yet

upvoted a collection 8 months ago

Qwen3

84 items • Updated 3 days ago • 1.53k

upvoted 2 papers 9 months ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16, 2025 • 166

LeMo: Enabling LEss Token Involvement for MOre Context Fine-tuning

Paper • 2501.09767 • Published Jan 15, 2025 • 2

upvoted an article 10 months ago

Article

Perceiver IO: a scalable, fully-attentional model that works on any modality

Dec 15, 2021

•

14