Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding"
Wenkai Yang
Keven16
AI & ML interests
None yet
Recent Activity
commented on
a paper
18 days ago
Critique-RL: Training Language Models for Critiquing through Two-Stage
Reinforcement Learning
authored
a paper
about 1 month ago
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
Organizations
None yet