GigaWorld-0: World Models as Data Engine to Empower Embodied AI

License Project Papers Demo Code

✨ Introduction

World models are emerging as a foundational paradigm for scalable, data-efficient embodied AI. In this work, we present GigaWorld-0, a unified world model framework designed explicitly as a data engine for Vision-Language-Action (VLA) learning. GigaWorld-0 integrates two synergistic components: GigaWorld-0-Video, which leverages large-scale video generation to produce diverse, texture-rich, and temporally coherent embodied sequences under fine-grained control of appearance, camera viewpoint, and action semantics; and GigaWorld-0-3D, which combines 3D generative modeling, 3D Gaussian Splatting reconstruction, physically differentiable system identification, and executable motion planning to ensure geometric consistency and physical realism.

πŸ—ΊοΈ Architecture

GigaWorld-0-Video-Dreamer is our foundation video generation model, capable of achieving IT2V generation.

Dreamer

πŸ“– Citation

If you use GigaWorld-0 in your research, please cite:

@misc{gigaai2025gigaworld0,
      title={GigaWorld-0: World Models as Data Engine to Empower Embodied AI},
      author={GigaAI},
      year={2025},
      eprint={2511.19861},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.19861},
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support