-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 103 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
Towards Best Practices for Open Datasets for LLM Training
Paper • 2501.08365 • Published • 63 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 72
jzwong
jzwong
·
AI & ML interests
None yet
Organizations
None yet
SYS
-
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 89 -
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
Paper • 2501.11873 • Published • 66 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 165 -
MoBA: Mixture of Block Attention for Long-Context LLMs
Paper • 2502.13189 • Published • 17
LLM
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 103 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
Towards Best Practices for Open Datasets for LLM Training
Paper • 2501.08365 • Published • 63 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 72
Novel
SYS
-
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 89 -
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
Paper • 2501.11873 • Published • 66 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 165 -
MoBA: Mixture of Block Attention for Long-Context LLMs
Paper • 2502.13189 • Published • 17
Survey