When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning Paper • 2602.08236 • Published 8 days ago • 9
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues Paper • 2601.17277 • Published 24 days ago • 6
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues Paper • 2601.17277 • Published 24 days ago • 6
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues Paper • 2601.17277 • Published 24 days ago • 6
rubricreward/mR3-Qwen3-14B-tgt-prompt-tgt-thinking-translated Text Generation • 15B • Updated Oct 2, 2025 • 4
rubricreward/mR3-Qwen3-8B-tgt-prompt-tgt-thinking-translated Text Generation • 8B • Updated Oct 2, 2025 • 2
rubricreward/mR3-Qwen3-4B-tgt-prompt-tgt-thinking-translated Text Generation • 4B • Updated Oct 2, 2025 • 6