Models used in CHARM: Calibrating Reward Models With Chatbot Arena Scores.
shawnxzhu
shawnxzhu
·
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
19 days ago
QueST: Incentivizing LLMs to Generate Difficult Problems
upvoted
a
paper
2 months ago
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with
Adaptive Exploration
Organizations
None yet