39 34 30

Shizhe Diao

shizhediao2

https://shizhediao.github.io/

AI & ML interests

LLM pre-training and reasoning

Recent Activity

upvoted a paper about 2 hours ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

liked a model 2 days ago

nvidia/Nemotron-Flash-1B

updated a dataset 23 days ago

nvidia/ToolScale

View all activity

Organizations

upvoted a paper about 2 hours ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published about 11 hours ago • 49

upvoted 2 papers about 1 month ago

Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models

Paper • 2511.18890 • Published Nov 24, 2025 • 32

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

Paper • 2511.21689 • Published Nov 26, 2025 • 114

upvoted 5 papers 3 months ago

Unified Reinforcement and Imitation Learning for Vision-Language Models

Paper • 2510.19307 • Published Oct 22, 2025 • 30

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning

Paper • 2510.15110 • Published Oct 16, 2025 • 15

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17, 2025 • 89

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13, 2025 • 177

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

Paper • 2510.11769 • Published Oct 13, 2025 • 25

upvoted an article 3 months ago

Article

Finally, a Replacement for BERT: Introducing ModernBERT

Dec 19, 2024

•

718

upvoted 2 papers 3 months ago

BroRL: Scaling Reinforcement Learning via Broadened Exploration

Paper • 2510.01180 • Published Oct 1, 2025 • 18

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29, 2025 • 141

upvoted an article 4 months ago

Article

Introducing smolagents: simple agents that write actions in code.

Dec 31, 2024

•

1.16k

upvoted a paper 4 months ago

Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3, 2025 • 22

upvoted a paper 5 months ago

Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

Paper • 2508.07976 • Published Aug 11, 2025 • 51

upvoted 4 papers 7 months ago

Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models

Paper • 2506.18945 • Published Jun 23, 2025 • 40

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Paper • 2506.09991 • Published Jun 11, 2025 • 55

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30, 2025 • 143

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Paper • 2505.22618 • Published May 28, 2025 • 44

upvoted 2 papers 8 months ago

Distilling LLM Agent into Small Models with Retrieval and Code Tools

Paper • 2505.17612 • Published May 23, 2025 • 81

MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly

Paper • 2505.10610 • Published May 15, 2025 • 54