--- license: mit pipeline_tag: image-to-3d --- # LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging LiteVGGT is a 3D vision foundation model that significantly boosts vanilla VGGT's performance by achieving up to 10x speedup and substantial memory reduction. This enables efficient processing of large-scale scenes (up to 1000 images) for 3D reconstruction, while maintaining high accuracy in camera pose and point cloud prediction. The method introduces a geometry-aware cached token merging strategy to optimize anchor token selection and reuse merge indices, preserving key geometric information with minimal accuracy impact. This model was presented in the paper: [LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging](https://huggingface.co/papers/2512.04939). - [Project Page](https://garlicba.github.io/LiteVGGT/) - [Code](https://github.com/GarlicBa/LiteVGGT-repo) ## Overview For 1000 input images, LiteVGGT achieves a **10× speedup** over VGGT while maintaining high accuracy in camera pose and point cloud prediction. Its scalability and robustness make large-scale scene reconstruction more efficient and reliable.

teaser

## Run Demo To quickly try out LiteVGGT for 3D reconstruction, follow these steps: First, create a virtual environment using Conda, clone this repository to your local machine, and install the required dependencies. ```bash conda create -n litevggt python=3.10 conda activate litevggt git clone git@github.com:GarlicBa/LiteVGGT-repo.git cd LiteVGGT-repo pip install -r requirements.txt ``` Install the Transformer Engine package following its official installation requirements (see https://github.com/NVIDIA/TransformerEngine): ```bash export CC=your/gcc/path export CXX=your/g++/path pip install --no-build-isolation transformer_engine[pytorch] ``` Then, download our LiteVGGT checkpoint that has been **finetuned** and **TE-remapped**: ```bash wget https://huggingface.co/ZhijianShu/LiteVGGT/resolve/main/te_dict.pt ``` Finally: ```bash python run_demo.py \ --ckpt_path path/to/your/te_dict.pt \ --img_dir path/to/your/img_dir \ --output_dir ./recon_result \ ```