---
license: apache-2.0
base_model:
- OpenGVLab/InternVL3-2B
---
**EN** | [中文](README_CN.md)
# SenseNova-SI: Scaling Spatial Intelligence with Multimodal Foundation Models
🔥Please check out our newly released [**SenseNova-SI-1.1-InternVL3-2B**](https://huggingface.co/sensenova/SenseNova-SI-1.1-InternVL3-2B) and
[**SenseNova-SI-1.1-InternVL3-8B**](https://huggingface.co/sensenova/SenseNova-SI-1.1-InternVL3-8B).
⏳***The current model will be deprecated in due course.***
## Overview
Despite remarkable progress, leading multimodal models still exhibit notable deficiencies in spatial intelligence:
the ability to make metric estimations, understand spatial relationships, handle viewpoint changes, and integrate information across complex scenes.
We take a scaling perspective: constructing and curating a large-scale, comprehensive collection of spatial intelligence data,
and through continued training on powerful multimodal foundations,
cultivating multi-faceted spatial understanding within the SenseNova-SI family of models.
*In the future, SenseNova-SI will be integrated with larger-scale in-house models.*
## Release Information
Currently, we build SenseNova-SI upon popular open-source foundation models to maximize compatibility with existing research pipelines.
In this release, we present
[**SenseNova-SI-InternVL3-2B**](https://huggingface.co/sensenova/SenseNova-SI-InternVL3-2B) and
[**SenseNova-SI-InternVL3-8B**](https://huggingface.co/sensenova/SenseNova-SI-InternVL3-8B),
which achieve state-of-the-art performance among open-source models of comparable size across four recent spatial intelligence benchmarks:
**VSI**, **MMSI**, **MindCube**, and **ViewSpatial**.
| Model | VSI | MMSI | MindCube-Tiny | ViewSpatial |
|---|---|---|---|---|
| Open-source Models (~2B) | ||||
| InternVL3-2B | 32.98 | 26.50 | 37.50 | 32.56 |
| Qwen3-VL-2B-Instruct | 50.36 | 28.90 | 34.52 | 36.97 |
| MindCube-3B-RawQA-SFT | 17.24 | 1.70 | 51.73 | 24.14 |
| MindCube-3B-Aug-CGMap-FFR-Out-SFT | 29.60 | 29.10 | 41.06 | 30.90 |
| MindCube-3B-Plain-CGMap-FFR-Out-SFT | 29.93 | 30.40 | 39.90 | 31.20 |
| SpatialLadder-3B | 44.86 | 27.40 | 43.46 | 39.85 |
| SpatialMLLM-4B | 45.98 | 26.10 | 33.46 | 34.66 |
| SenseNova-SI-InternVL3-2B | 58.47 | 35.50 | 71.35 | 40.62 |
| Open-source Models (~8B) | ||||
| InternVL3-8B | 42.14 | 28.00 | 41.54 | 38.66 |
| Qwen3-VL-8B-Instruct | 57.90 | 31.10 | 29.42 | 42.20 |
| BAGEL-7B | 30.90 | 33.10 | 34.71 | 41.32 |
| SpaceR-7B | 36.29 | 27.40 | 37.98 | 35.85 |
| ViLaSR-7B | 44.63 | 30.20 | 35.10 | 35.71 |
| SenseNova-SI-InternVL3-8B | 62.80 | 37.90 | 89.33 | 53.92 |
| Proprietary Models | ||||
| Gemini-2.5-pro-2025-06 | 53.57 | 38.00 | 57.60 | 46.06 |
| Grok-4-2025-07-09 | 47.92 | 37.80 | 63.56 | 43.23 |
| GPT-5-2025-08-07 | 55.03 | 41.80 | 56.30 | 45.59 |