β¨ Introduction
World models are emerging as a foundational paradigm for scalable, data-efficient embodied AI. In this work, we present GigaWorld-0, a unified world model framework designed explicitly as a data engine for Vision-Language-Action (VLA) learning. GigaWorld-0 integrates two synergistic components: GigaWorld-0-Video, which leverages large-scale video generation to produce diverse, texture-rich, and temporally coherent embodied sequences under fine-grained control of appearance, camera viewpoint, and action semantics; and GigaWorld-0-3D, which combines 3D generative modeling, 3D Gaussian Splatting reconstruction, physically differentiable system identification, and executable motion planning to ensure geometric consistency and physical realism.
πΊοΈ Architecture
GigaWorld-0-Video-Dreamer is our foundation video generation model, capable of achieving IT2V generation.
π Citation
If you use GigaWorld-0 in your research, please cite:
@misc{gigaai2025gigaworld0,
title={GigaWorld-0: World Models as Data Engine to Empower Embodied AI},
author={GigaAI},
year={2025},
eprint={2511.19861},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.19861},
}
- Downloads last month
- -
