-
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
Paper • 2601.21821 • Published • 57 -
OpenDataArena/MMFineReason-1.8M-Qwen3-VL-235B-Thinking
Viewer • Updated • 1.81M • 3.28k • 110 -
OpenDataArena/MMFineReason-SFT-586K-Qwen3-VL-235B-Thinking
Viewer • Updated • 586k • 466 • 3 -
OpenDataArena/MMFineReason-SFT-123K-Qwen3-VL-235B-Thinking
Viewer • Updated • 123k • 671 • 67
AI & ML interests
Data-centric AI, LLM, MLLM
Recent Activity
Papers
View all Papers
Organization Card
🌐 About OpenDataArena
OpenDataArena (ODA) is an open research initiative devoted to evaluating, benchmarking, and creating high-value datasets for the post-training era of large language models (LLMs).
We believe data quality defines model capability — and that open, reproducible evaluation is key to accelerating progress in AI.
🚀 Our Mission
To make data evaluation scientific, transparent, and community-driven, while continuously producing high-value, openly available datasets that enhance model alignment and reasoning ability.
🔑 Key Features
- 🏆 Dataset Leaderboard — Leaderboard ranks the most valuable datasets across multiple domains, based on diverse benchmarks.
- 📊 Comprehensive Scoring System — Scoring tool measures dataset quality, diversity, and learning values using reproducible pipelines.
- 🧰 Open-Source Toolkit — OpenDataArena-Tool enables dataset evaluation, scoring with a standardized, community-driven workflow.
- 🌱 High-Value Data Generation — beyond evaluation, ODA continuously produces and shares new, top-quality datasets for fine-tuning and alignment research.
If you find our work helpful, please consider ⭐ starring and subscribing to support open, data-driven AI research. Learn more at opendataarena.github.io.
(OpenDataArena is part of OpenDataLab).
High-quality STEM reasoning dataset for Multimodal LLM post-training.
High-quality mixture datasets for post-training covering multiple domains.
-
OpenDataArena/ODA-Mixture-500k
Viewer • Updated • 506k • 3.07k • 121 -
OpenDataArena/ODA-Mixture-100k
Viewer • Updated • 101k • 1.64k • 95 -
OpenDataArena/Qwen2.5-7B-ODA-Mixture-500k
Text Generation • 333k • Updated • 10 • 1 -
OpenDataArena/Qwen2.5-7B-ODA-Mixture-100k
Text Generation • 333k • Updated • 11
High-quality STEM reasoning dataset for Multimodal LLM post-training.
-
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
Paper • 2601.21821 • Published • 57 -
OpenDataArena/MMFineReason-1.8M-Qwen3-VL-235B-Thinking
Viewer • Updated • 1.81M • 3.28k • 110 -
OpenDataArena/MMFineReason-SFT-586K-Qwen3-VL-235B-Thinking
Viewer • Updated • 586k • 466 • 3 -
OpenDataArena/MMFineReason-SFT-123K-Qwen3-VL-235B-Thinking
Viewer • Updated • 123k • 671 • 67
High-quality mixture datasets for post-training covering multiple domains.
-
OpenDataArena/ODA-Mixture-500k
Viewer • Updated • 506k • 3.07k • 121 -
OpenDataArena/ODA-Mixture-100k
Viewer • Updated • 101k • 1.64k • 95 -
OpenDataArena/Qwen2.5-7B-ODA-Mixture-500k
Text Generation • 333k • Updated • 10 • 1 -
OpenDataArena/Qwen2.5-7B-ODA-Mixture-100k
Text Generation • 333k • Updated • 11
models
9
OpenDataArena/MMFineReason-4B
Visual Question Answering
•
5B
•
Updated
•
54
•
13
OpenDataArena/MMFineReason-2B
Visual Question Answering
•
2B
•
Updated
•
25
•
8
OpenDataArena/MMFineReason-8B
Visual Question Answering
•
9B
•
Updated
•
19
•
8
OpenDataArena/Qwen3-8B-ODA-Math-460k
Text Generation
•
308k
•
Updated
•
14
•
1
OpenDataArena/Qwen2.5-7B-ODA-Math-460k
Text Generation
•
8B
•
Updated
•
15
OpenDataArena/Qwen3-8B-ODA-Mixture-100k
Text Generation
•
308k
•
Updated
•
44
•
1
OpenDataArena/Qwen3-8B-ODA-Mixture-500k
Text Generation
•
308k
•
Updated
•
20
OpenDataArena/Qwen2.5-7B-ODA-Mixture-100k
Text Generation
•
333k
•
Updated
•
11
OpenDataArena/Qwen2.5-7B-ODA-Mixture-500k
Text Generation
•
333k
•
Updated
•
10
•
1
datasets
9
OpenDataArena/MMFineReason-SFT-123K-Qwen3-VL-235B-Thinking
Viewer
•
Updated
•
123k
•
671
•
67
OpenDataArena/MMFineReason-SFT-586K-Qwen3-VL-235B-Thinking
Viewer
•
Updated
•
586k
•
466
•
3
OpenDataArena/MMFineReason-Full-2.3M-Qwen3-VL-235B-Thinking
Viewer
•
Updated
•
2.29M
•
2.95k
•
59
OpenDataArena/MMFineReason-1.8M-Qwen3-VL-235B-Thinking
Viewer
•
Updated
•
1.81M
•
3.28k
•
110
OpenDataArena/ODA-Math-460k
Viewer
•
Updated
•
460k
•
3.81k
•
103
OpenDataArena/ODA-Mixture-100k
Viewer
•
Updated
•
101k
•
1.64k
•
95
OpenDataArena/ODA-Mixture-500k
Viewer
•
Updated
•
506k
•
3.07k
•
121
OpenDataArena/OpenDataArena-scored-data
Viewer
•
Updated
•
15.7M
•
11.2k
•
10
OpenDataArena/MathLake
Viewer
•
Updated
•
8.31M
•
1.22k
•
21