Model Card for CronusVLA-Libero

checkpoints/*.pt: Complete model checkpoint of CronusVLA-7B for direct evaluation on Libero-spatial. It conditions on the textual instruction, a third-person image and a wrist-view image (no state).
*.txt: the evaluation results with 3 random seeds
dataset_statistics.json: the normalization statistics for the dataset

If you want to evaluate or further finetune with this checkpoint, please refer to CronusVLA for more details.

Evaluation:

#!/bin/bash

CUDA_DEVICES=(0 1 2 3 4 5 6 7)  # can be modified according to demand

# pre-trained model storage directory
CHECKPOINT_DIR=./checkpoints

CHECKPOINTS=($(ls "$CHECKPOINT_DIR"/*.pt | sort))

# CUDA devices number
NUM_CUDA_DEVICES=${#CUDA_DEVICES[@]}
INDEX=0

for CHECKPOINT in "${CHECKPOINTS[@]}"; do
    CUDA_DEVICE=${CUDA_DEVICES[$((INDEX % NUM_CUDA_DEVICES))]}  # polling to allocate GPU
    echo "Running on GPU $CUDA_DEVICE with checkpoint $CHECKPOINT"
    
    CUDA_VISIBLE_DEVICES=$CUDA_DEVICE python experiments/Libero/robot/libero/run_libero_eval.py \
        --model_family cronus \
        --pretrained_checkpoint "$CHECKPOINT" \
        --task_suite_name libero_spatial \
        --center_crop True \
        --use_wrist_image True &
    
    sleep 2
    ((INDEX++))
done

wait

📚 Citation

If you find this model useful, please cite our work:

@article{li2025cronusvla,
  title={CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation},
  author={Li, Hao and Yang, Shuai and Chen, Yilun and Tian, Yang and Yang, Xiaoda and Chen, Xinyi and Wang, Hanqing and Wang, Tai and Zhao, Feng and Lin, Dahua and others},
  journal={arXiv preprint arXiv:2506.19816},
  year={2025}
}

Downloads last month: 10

Video Preview

Robotics

Model tree for JeasLee/cronusvla_7B_libero_spatial_w_wrist

Base model

CogACT/CogACT-Base

Finetuned

JeasLee/cronusvla_modified_ckpt_post-train

Finetuned

(6)

this model

Collection including JeasLee/cronusvla_7B_libero_spatial_w_wrist

CronusVLA

Collection

Paper, Data and Checkpoints for ''CronusVLA: Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling'' • 12 items • Updated 12 days ago • 2