Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models Paper • 2603.01571 • Published 8 days ago • 32
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published 8 days ago • 55
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 183