-
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Paper • 2505.20686 • Published • 2 -
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer • Updated • 7.47k • 39 -
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer • Updated • 7.47k • 8 -
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer • Updated • 7.47k • 10
Cornell-AGI
university
AI & ML interests
Reinforcement Learning from Human Feedback
Recent Activity
View all activity
-
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Paper • 2410.04612 • Published -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1
8B • Updated -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2
8B • Updated -
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer • Updated • 64.6k • 16 • 2
-
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Paper • 2505.20686 • Published • 2 -
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer • Updated • 7.47k • 39 -
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer • Updated • 7.47k • 8 -
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer • Updated • 7.47k • 10
-
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Paper • 2410.04612 • Published -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1
8B • Updated -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2
8B • Updated -
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer • Updated • 64.6k • 16 • 2
models
20
Cornell-AGI/apo_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
•
3
Cornell-AGI/ppo_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
•
27
Cornell-AGI/rebel_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
•
2
Cornell-AGI/grpo_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
•
2
Cornell-AGI/grpo_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
•
5
Cornell-AGI/ppo_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
•
10
Cornell-AGI/rebel_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
Cornell-AGI/apo_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
•
5
Cornell-AGI/grpo_math_qwen2.5_7b
Text Generation
•
8B
•
Updated
•
4
Cornell-AGI/ppo_math_qwen2.5_7b
Text Generation
•
8B
•
Updated
•
17
datasets
15
Cornell-AGI/math_size_qwen2.5_7b_eval
Viewer
•
Updated
•
7.5k
•
28
Cornell-AGI/math_size_qwen2.5_3b_eval
Viewer
•
Updated
•
7.5k
•
10
Cornell-AGI/math_size_qwen2.5_1.5b_eval
Viewer
•
Updated
•
7.5k
•
120
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer
•
Updated
•
7.47k
•
10
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer
•
Updated
•
7.47k
•
8
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer
•
Updated
•
7.47k
•
39
Cornell-AGI/amazon_movie_tv_item_mxbai
Viewer
•
Updated
•
10.5k
•
18
Cornell-AGI/amazon_movie_tv_llama_mxbai
Viewer
•
Updated
•
17.1k
•
121
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_2
Viewer
•
Updated
•
116k
•
35
•
1
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer
•
Updated
•
64.6k
•
16
•
2