-
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
Paper • 2506.20512 • Published • 47 -
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
Paper • 2509.15194 • Published • 33 -
Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness
Paper • 2505.22960 • Published • 16
Zeyu Qin
qqqzzzyyy
·
AI & ML interests
Scalable Oversight, AI safety
Recent Activity
upvoted
a
paper
11 days ago
The End of Manual Decoding: Towards Truly End-to-End Language Models
upvoted
a
paper
about 1 month ago
MLE-Smith: Scaling MLE Tasks with Automated Multi-Agent Pipeline
upvoted
a
collection
about 1 month ago
AceReason
Organizations
None yet