init

Browse files

Files changed (15) hide show

.gitattributes +11 -0
README.md +71 -1
assets/icon.png +3 -0
assets/method.png +3 -0
assets/teaser.png +3 -0
chat_template.json +3 -0
config.json +3 -0
dino_config.json +3 -0
generation_config.json +3 -0
preprocessor_config.json +3 -0
text_config.json +3 -0
tokenizer.json +3 -0
tokenizer_config.json +3 -0
vit_config.json +3 -0
vocab.json +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,14 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+generation_config.json filter=lfs diff=lfs merge=lfs -text
+preprocessor_config.json filter=lfs diff=lfs merge=lfs -text
+text_config.json filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
+vit_config.json filter=lfs diff=lfs merge=lfs -text
+chat_template.json filter=lfs diff=lfs merge=lfs -text
+config.json filter=lfs diff=lfs merge=lfs -text
+dino_config.json filter=lfs diff=lfs merge=lfs -text
+tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
+vocab.json filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,73 @@
 ---
-license: bsd-3-clause
 ---

 ---
+license: apache-2.0
+language:
+- en
+pipeline_tag: image-text-to-text
+tags:
+- multimodal
+library_name: transformers
+base_model:
+- Qwen/Qwen2-VL-2B
 ---
+# G2VLM-2B-MoT
+## Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
+<p align="left">
+  <img src="https://huggingface.co/InternRobotics/G2VLM-2B-MoT/resolve/main/assets/icon.png" alt="G2VLM" width="200"/>
+</p>
+<p align="left">
+  <a href="https://gordonhu608.github.io/g2vlm.github.io/">
+    <img
+      src="https://img.shields.io/badge/G2VLM-Website-0A66C2?logo=safari&logoColor=white" style="display: inline-block; vertical-align: middle;"
+      alt="G2VLM Website"
+    />
+  </a>
+  <a href="https://arxiv.org/abs/2511.21688">
+    <img
+      src="https://img.shields.io/badge/G2VLM-Paper-red?logo=arxiv&logoColor=red" style="display: inline-block; vertical-align: middle;"
+      alt="G2VLM Paper on arXiv"
+    />
+  </a>
+  <a href="https://github.com/InternRobotics/G2VLM" target="_blank" style="margin: 2px;">
+      <img
+        alt="Github" src="https://img.shields.io/badge/G2VLM-Codebase-536af5?color=536af5&logo=github" style="display: inline-block; vertical-align: middle;"
+        alt="G2VLM Codebase"
+      />
+  </a>
+</p>
+> We present <b>G<sup>2</sup>VLM</b>, a geometry grounded vision-language model proficient in both spatial 3D reconstruction and spatial understanding tasks. For spatial reasoning questions, G<sup>2</sup>VLM can natively predict 3D geometry and employ interleaved reasoning for an answer.
+This repository hosts the model weights for <b>G<sup>2</sup>VLM</b>. For installation, usage instructions, and further documentation, please visit our [GitHub repository](https://github.com/InternRobotics/G2VLM).
+<p align="left"><img src="https://huggingface.co/InternRobotics/G2VLM-2B-MoT/resolve/main/assets/teaser.png" width="100%"></p>
+## 🧠 Method
+<i>G<sup>2</sup>VLM is a unified model that integrates both a geometric perception expert for 3D reconstruction and a semantic perception expert for multimodal understanding and spatial reasoning tasks. All tokens can do shared multi-modal self attention in each transformer block.
+<p align="left"><img src="https://huggingface.co/InternRobotics/G2VLM-2B-MoT/resolve/main/assets/method.png" width="100%"></p>
+## License
+G2VLM is licensed under the Apache 2.0 license.
+## ✍️ Citation
+```bibtex
+@article{hu2025g2vlmgeometrygroundedvision,
+      title={G$^2$VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning},
+      author={Wenbo Hu and Jingli Lin and Yilin Long and Yunlong Ran and Lihan Jiang and Yifan Wang and Chenming Zhu and Runsen Xu and Tai Wang and Jiangmiao Pang},
+      year={2025},
+      eprint={2511.21688},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2511.21688},
+}
+```

assets/icon.png ADDED Viewed

Git LFS Details

SHA256: 99a01bb4656afbc75a4aaa214a938638a7c282e76d627f7a1c9595bb0cc48e74
Pointer size: 131 Bytes
Size of remote file: 732 kB

assets/method.png ADDED Viewed

Git LFS Details

SHA256: 4fc1e81b11de6fcf7a93ed85c148b62e91262485130935b352628f2ead1a45f0
Pointer size: 131 Bytes
Size of remote file: 598 kB

assets/teaser.png ADDED Viewed

Git LFS Details

SHA256: 08dec49bd34395370157a949235d685536cb4bc8d5717f4a751ee62d877c7727
Pointer size: 132 Bytes
Size of remote file: 1.48 MB

chat_template.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ad60d90252ed0b0705ba14e2d0ad0fec0beac1ea955642b54059b36052d8bc96
+size 1050

config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:422adefa19e62dd175961cec85bc0400344fe5bf9b22bd1182e05aaae78556e0
+size 1196

dino_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:03eee42f646659a9480f8911a81fdd81efeedd7ff39083c8e36398068daf72f5
+size 1003

generation_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d2864bf1edea5863d331edfff48106b586a366f5a2c41aa77731fadc53aa25d2
+size 272

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b5eaad0c2815f07631535dcc58f3c462b0d73693638ad21d19f3c50820eae1cc
+size 347

text_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:422adefa19e62dd175961cec85bc0400344fe5bf9b22bd1182e05aaae78556e0
+size 1196

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cb63a0a23eef3d5b01063a9880a1925a65aaf4d1591d519910ee3527852950a0
+size 7029741

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ff5c4fd898fe8c39591eb70e5d39d2782802d4204d6ae9ba1223252f354842a0
+size 4190

vit_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e376158b1b95be08e1aab39196db5103a9b7961b8a7afe9682b066cd744c6964
+size 218

vocab.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ca10d7e9fb3ed18575dd1e277a2579c16d108e32f27439684afa0e10b1440910
+size 2776833