ZhijianShu
/

LiteVGGT

Image-to-3D

Model card Files Files and versions

xet

Community

Improve model card: add pipeline tag, paper, project, code links, and full usage

by nielsr HF Staff - opened 9 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+71

-3

Files changed (1) hide show

README.md +71 -3

README.md CHANGED Viewed

@@ -1,3 +1,71 @@
----
-license: mit
----

+---
+license: mit
+pipeline_tag: image-to-3d
+---
+# LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging
+LiteVGGT is a 3D vision foundation model that significantly boosts vanilla VGGT's performance by achieving up to 10x speedup and substantial memory reduction. This enables efficient processing of large-scale scenes (up to 1000 images) for 3D reconstruction, while maintaining high accuracy in camera pose and point cloud prediction. The method introduces a geometry-aware cached token merging strategy to optimize anchor token selection and reuse merge indices, preserving key geometric information with minimal accuracy impact.
+This model was presented in the paper: [LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging](https://huggingface.co/papers/2512.04939).
+- 🏠 [Project Page](https://garlicba.github.io/LiteVGGT/)
+- \ud83d\udcbb [Code](https://github.com/GarlicBa/LiteVGGT-repo)
+## Overview
+For 1000 input images, LiteVGGT achieves a **10\u00d7 speedup** over VGGT while maintaining high accuracy in camera pose and point cloud prediction. Its scalability and robustness make large-scale scene reconstruction more efficient and reliable.
+<p align="center">
+  <img src="https://github.com/GarlicBa/LiteVGGT-repo/raw/main/assets/teaser.png" alt="teaser" width="100%">
+</p>
+## Run Demo
+To quickly try out LiteVGGT for 3D reconstruction, follow these steps:
+1.  **Environment Setup:**
+    First, create a virtual environment using Conda, clone this repository to your local machine, and install the required dependencies.
+    ```bash
+    conda create -n litevggt python=3.10
+    conda activate litevggt
+    git clone git@github.com:GarlicBa/LiteVGGT-repo.git
+    cd LiteVGGT-repo
+    pip install -r requirements.txt
+    ```
+2.  **Install Transformer Engine:**
+    Install the Transformer Engine package following its official installation requirements (see https://github.com/NVIDIA/TransformerEngine):
+    ```bash
+    export CC=your/gcc/path
+    export CXX=your/g++/path
+    pip install --no-build-isolation transformer_engine[pytorch]
+    ```
+3.  **Download Checkpoint:**
+    Then, download our LiteVGGT checkpoint that has been **finetuned** and **TE-remapped**:
+    ```bash
+    wget https://huggingface.co/ZhijianShu/LiteVGGT/resolve/main/te_dict.pt
+    ```
+4.  **Run Inference:**
+    ```bash
+    python run_demo.py \
+      --ckpt_path path/to/your/te_dict.pt \
+      --img_dir path/to/your/img_dir \
+      --output_dir ./recon_result \
+    ```
+## Citation
+If you find this project helpful, citing our paper would be greatly appreciated:
+```bibtex
+@inproceedings{wang2025vggt,
+  title={VGGT: Visual Geometry Grounded Transformer},
+  author={Wang, Jianyuan and Chen, Minghao and Karaev, Nikita and Vedaldi, Andrea and Rupprecht, Christian and Novotny, David},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  year={2025}
+}
+```