Pi0.5-LIBERO W8A8 Quantized Model

This is a W8A8 (INT8 weights + INT8 activations) quantized version of Pi0.5-LIBERO, using BitBLAS for efficient INT8 Tensor Core computation.

Model Description

  • Architecture: Pi0.5 (PaliGemma 2B VLM + Gemma 300M Action Expert)
  • Quantization: W8A8 (INT8 weights, INT8 activations) with per-channel weight scales and per-tensor activation scales
  • Backend: BitBLAS for INT8 Tensor Core acceleration
  • W8A8 Layers: 180 Linear layers replaced with BitBLASW8A8Linear
  • Model Size: ~4.90GB (quantized) vs ~6.96GB (FP16)
  • Task: Robot manipulation (LIBERO benchmark)

Performance

Tested on LIBERO benchmark:

Task Suite Success Rate
libero_spatial 100% (10/10)

Inference speed on NVIDIA A40:

  • First inference: ~30s (BitBLAS kernel compilation/caching)
  • Subsequent inference: ~150ms per step
  • Memory usage: ~4GB VRAM

Installation

Prerequisites

  • Python 3.11
  • CUDA 12.1+ compatible GPU (tested on NVIDIA A40)
  • Linux (Ubuntu 22.04 recommended)

Step 1: Create Conda Environment

conda create -n openpi_w8a8 python=3.11 -y
conda activate openpi_w8a8

Step 2: Install PyTorch

pip install torch==2.7.1 torchvision
pip install 'numpy<2.0.0'

Step 3: Install HuggingFace Packages

pip install transformers==4.53.2 accelerate safetensors huggingface_hub einops

Step 4: Install BitBLAS and Robot Simulation

pip install bitblas scipy mujoco matplotlib
pip install robosuite==1.4.1
pip install bddl easydict
pip install 'numpy<2.0.0'

Step 5: Install JAX and Flax

pip install jax[cuda12]==0.5.3 flax==0.10.2 orbax-checkpoint==0.11.13

Step 6: Install Other Dependencies

pip install sentencepiece draccus==0.10.0 tyro wandb polars numpydantic augmax \
    beartype==0.19.0 equinox jaxtyping==0.2.36 ml-collections==1.0.0 \
    imageio tqdm-loggable flatbuffers Pillow

Step 7: Install LeRobot

pip install 'lerobot @ git+https://github.com/huggingface/lerobot@0cf864870cf29f4738d3ade893e6fd13fbd7cdb5' --no-deps
pip install datasets h5py zarr diffusers hydra-core gym jsonlines av torchcodec

Step 8: Clone and Install OpenPI

git clone https://github.com/JingxuanZhang77/openpi_duquant.git openpi
pip install -e openpi/packages/openpi-client --no-deps
pip install -e openpi --no-deps

Step 9: Clone and Install LIBERO (for evaluation)

git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
touch LIBERO/libero/__init__.py
pip install -e LIBERO --no-deps

Step 10: Copy Custom Transformers Files

SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])")
cp -r openpi/src/openpi/models_pytorch/transformers_replace/models/* $SITE_PACKAGES/transformers/models/

Step 11: Set Environment Variables

export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH
export OPENPI_DISABLE_TORCH_COMPILE=1

Quick Start

import os
os.environ["OPENPI_DISABLE_TORCH_COMPILE"] = "1"

from openpi.models_pytorch.bitblas_w8a8_layers import load_w8a8_policy

# Load model from HuggingFace (downloads automatically)
policy = load_w8a8_policy(
    "fatdove/pi05-libero-w8a8",
    policy_config_name="pi05_libero",
    enable_tuning=False,
)

print(f"Model loaded! W8A8 layers: {policy._w8a8_layer_count}")  # 180

# Run inference
import numpy as np
obs = {
    "observation/image": np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8),
    "observation/wrist_image": np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8),
    "observation/state": np.random.randn(8).astype(np.float32),
    "prompt": "pick up the red cube and place it on the blue plate",
}
result = policy.infer(obs)
print(f"Actions shape: {result['actions'].shape}")  # (10, 7)

Run LIBERO Evaluation

After completing the installation, you can run the LIBERO evaluation:

# Quick test (1 trial per task)
python run_libero_w8a8.py --task-suite libero_spatial --num-trials 1

# Full evaluation (20 trials per task)
python run_libero_w8a8.py --task-suite libero_spatial --num-trials 20

Model Files

  • model.safetensors - Quantized weights (includes both W8A8 and non-quantized layers)
  • w8a8_config.json - Quantization configuration (layer names, scales info)
  • assets/ - Normalization statistics for input preprocessing

Quantization Details

The W8A8 quantization uses:

  • Weight quantization: Per-channel INT8 with symmetric quantization
  • Activation quantization: Per-tensor INT8 with dynamic quantization
  • Backend: BitBLAS Matmul kernels optimized for NVIDIA Tensor Cores

180 Linear layers are quantized, including:

  • PaliGemma VLM: All attention (q_proj, k_proj, v_proj, o_proj) and MLP (gate_proj, up_proj, down_proj) layers
  • Gemma Expert: All MLP layers

Troubleshooting

CUDA/BitBLAS Issues

Make sure CUDA 12.x is installed and LD_LIBRARY_PATH is set:

export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH

Import Errors

If you get import errors like cannot import name 'ACT2FN', ensure you've copied the transformers_replace files:

SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])")
cp -r openpi/src/openpi/models_pytorch/transformers_replace/models/* $SITE_PACKAGES/transformers/models/

NumPy Version Errors

OpenPI requires numpy<2.0:

pip install 'numpy<2.0.0'

Citation

If you use this model, please cite:


License

Apache 2.0. See the OpenPI repository for more details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading