YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

XR-1-Stage2

[Project Page] [Paper] [GitHub]

This repository contains the Stage 2 weights for the XR-1 (X Robotic Model 1) project. This is the large-scale Vision-Language-Action (VLA) policy trained on top of the Unified Vision-Motion Codes (UVMC).

πŸ€– Model Description

XR-1-Stage2 is a generative Transformer-based policy. While Stage 1 focused on tokenizing sensory and action data, Stage 2 performs the high-level reasoning required to map natural language instructions and visual observations into executable robotic actions.

Key Features

  • VLA Architecture: Integrates vision, language, and action into a unified Transformer framework.
  • Token-Based Policy: Operates on the discrete latent space defined by the XR-1-Stage1-UVMC tokenizer.
  • End-to-End Control: Predicts the next set of action tokens based on the current visual state and text prompt.

🧱 Model Hierarchy

  1. Stage 1 (UVMC): Encodes images and actions into discrete tokens.
  2. Stage 2 (VLA): [This Model] Predicts action tokens given instruction and image tokens.

πŸ›  Usage

To deploy this policy, you need both the Stage 1 tokenizer and these Stage 2 weights. Please refer to the XR-1 GitHub Repository for inference scripts and environment setup.

πŸ“ Citation

If you use the XR-1 model in your research, please cite our work:

@article{fan2025xr,
  title={XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations},
  author={Fan, Shichao and Wu, Kun and Che, Zhengping and Wang, Xinhua and Wu, Di and Liao, Fei and Liu, Ning and Zhang, Yixue and Zhao, Zhen and Xu, Zhiyuan and others},
  journal={arXiv preprint arXiv:2411.02776},
  year={2025}
}

πŸ“œ License

This project is licensed under the MIT License.


Contact: For questions, please open an issue on our GitHub or contact us at opensource@x-humanoid.com.

Discussions

If you're interested in XR-1, welcome to join our WeChat group for discussions.

Downloads last month
17
Safetensors
Model size
4B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including X-Humanoid/XR-1-Stage2

Paper for X-Humanoid/XR-1-Stage2