CronusVLA
Collection
Paper, Data and Checkpoints for ''CronusVLA: Towards Efficient and Robust Manipulation
via Multi-Frame Vision-Language-Action Modeling''
β’
12 items
β’
Updated
β’
2
checkpoints/*.pt: Complete model checkpoint of CronusVLA-7B for direct evaluation on Libero-spatial. It conditions on the textual instruction, a third-person image and a wrist-view image (no state).
*.txt: the evaluation results with 3 random seeds
dataset_statistics.json: the normalization statistics for the dataset
If you want to evaluate or further finetune with this checkpoint, please refer to CronusVLA for more details.
#!/bin/bash
CUDA_DEVICES=(0 1 2 3 4 5 6 7) # can be modified according to demand
# pre-trained model storage directory
CHECKPOINT_DIR=./checkpoints
CHECKPOINTS=($(ls "$CHECKPOINT_DIR"/*.pt | sort))
# CUDA devices number
NUM_CUDA_DEVICES=${#CUDA_DEVICES[@]}
INDEX=0
for CHECKPOINT in "${CHECKPOINTS[@]}"; do
CUDA_DEVICE=${CUDA_DEVICES[$((INDEX % NUM_CUDA_DEVICES))]} # polling to allocate GPU
echo "Running on GPU $CUDA_DEVICE with checkpoint $CHECKPOINT"
CUDA_VISIBLE_DEVICES=$CUDA_DEVICE python experiments/Libero/robot/libero/run_libero_eval.py \
--model_family cronus \
--pretrained_checkpoint "$CHECKPOINT" \
--task_suite_name libero_spatial \
--center_crop True \
--use_wrist_image True &
sleep 2
((INDEX++))
done
wait
If you find this model useful, please cite our work:
@article{li2025cronusvla,
title={CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation},
author={Li, Hao and Yang, Shuai and Chen, Yilun and Tian, Yang and Yang, Xiaoda and Chen, Xinyi and Wang, Hanqing and Wang, Tai and Zhao, Feng and Lin, Dahua and others},
journal={arXiv preprint arXiv:2506.19816},
year={2025}
}
Base model
CogACT/CogACT-Base