--- license: mit pipeline_tag: image-to-3d --- # LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging LiteVGGT is a 3D vision foundation model that significantly boosts vanilla VGGT's performance by achieving up to 10x speedup and substantial memory reduction. This enables efficient processing of large-scale scenes (up to 1000 images) for 3D reconstruction, while maintaining high accuracy in camera pose and point cloud prediction. The method introduces a geometry-aware cached token merging strategy to optimize anchor token selection and reuse merge indices, preserving key geometric information with minimal accuracy impact. This model was presented in the paper: [LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging](https://huggingface.co/papers/2512.04939). - [Project Page](https://garlicba.github.io/LiteVGGT/) - [Code](https://github.com/GarlicBa/LiteVGGT-repo) ## Overview For 1000 input images, LiteVGGT achieves a **10× speedup** over VGGT while maintaining high accuracy in camera pose and point cloud prediction. Its scalability and robustness make large-scale scene reconstruction more efficient and reliable.