-
E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models
Paper • 2601.00423 • Published • 9 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 222 -
FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning
Paper • 2601.18150 • Published • 6 -
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment
Paper • 2601.20218 • Published • 15
yangh
hac
AI & ML interests
None yet
Recent Activity
updated
a collection
8 days ago
RL
updated
a collection
8 days ago
RL
updated
a collection
11 days ago
RL
Organizations
None yet