Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

jzwong's picture

jzwong

jzwong

·

AI & ML interests

None yet

Organizations

None yet

jzwong 's collections 4

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 103
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published Jan 14 • 63
Qwen2.5-1M Technical Report

Paper • 2501.15383 • Published Jan 26 • 72

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 89
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21 • 66
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165
MoBA: Mixture of Block Attention for Long-Context LLMs

Paper • 2502.13189 • Published Feb 18 • 17

Redundancy Principles for MLLMs Benchmarks

Paper • 2501.13953 • Published Jan 20 • 29
Autonomy-of-Experts Models

Paper • 2501.13074 • Published Jan 22 • 44
Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 48
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14 • 122

Taming the Titans: A Survey of Efficient LLM Inference Serving

Paper • 2504.19720 • Published Apr 28 • 12
Thus Spake Long-Context Large Language Model

Paper • 2502.17129 • Published Feb 24 • 73
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More

Paper • 2407.16216 • Published Jul 23, 2024

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 103
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published Jan 14 • 63
Qwen2.5-1M Technical Report

Paper • 2501.15383 • Published Jan 26 • 72

Redundancy Principles for MLLMs Benchmarks

Paper • 2501.13953 • Published Jan 20 • 29
Autonomy-of-Experts Models

Paper • 2501.13074 • Published Jan 22 • 44
Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 48
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14 • 122

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 89
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21 • 66
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165
MoBA: Mixture of Block Attention for Long-Context LLMs

Paper • 2502.13189 • Published Feb 18 • 17

Taming the Titans: A Survey of Efficient LLM Inference Serving

Paper • 2504.19720 • Published Apr 28 • 12
Thus Spake Long-Context Large Language Model

Paper • 2502.17129 • Published Feb 24 • 73
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More

Paper • 2407.16216 • Published Jul 23, 2024

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs