My AI FlowRL: Matching Reward Distributions for LLM Reasoning Paper • 2509.15207 • Published Sep 18 • 112 Kwaipilot/KAT-Dev-72B-Exp Text Generation • 73B • Updated Oct 13 • 1.41k • 148 Agentic Entropy-Balanced Policy Optimization Paper • 2510.14545 • Published about 1 month ago • 102
My AI FlowRL: Matching Reward Distributions for LLM Reasoning Paper • 2509.15207 • Published Sep 18 • 112 Kwaipilot/KAT-Dev-72B-Exp Text Generation • 73B • Updated Oct 13 • 1.41k • 148 Agentic Entropy-Balanced Policy Optimization Paper • 2510.14545 • Published about 1 month ago • 102