AI & ML interests
None defined yet.
Recent Activity
Organization Card
OXE-AugE: Cross-Embodiment Robot Augmentation at Scale
Augmentation for OXE dataset
π Project Page Β· π Paper Β· π» GitHub Β· π¦ Twitter
TL;DR
- What we do. We transform one set of source-robot demos into 9 robots in total via a pipeline of source replay β segmentation & inpainting β target replay, then package everything in LeRobot format.
- Why it matters. The same task can be trained and evaluated across embodiments and labs, enabling robust cross-embodiment learning and large-scale pretraining.
- Whatβs included. We currently provide 16 datasets, covering 9 robots, totaling 4.4 millions of trajectories).
π€ Robots & Coverage
Legend: β = source robotβ|ββ = augmented demos available For the full, current table, see the Dashboard or dataset READMEs.
| Dataset | Panda | UR5e | Xarm7 | WidowX | Sawyer | Kinova3 | IIWA | Jaco | # Episodes | |
|---|---|---|---|---|---|---|---|---|---|---|
| Berkeley AUTOLab UR5 | β | β | β | β | β | β | β | β | β | 1000 |
| TACO Play | β | β | β | β | β | β | β | β | β | 3603 |
| Austin BUDS | β | β | β | β | β | β | β | β | β | 50 |
| Austin Mutex | β | β | β | β | β | β | β | β | β | 1500 |
| Austin Sailor | β | β | β | β | β | β | β | β | β | 240 |
| CMU Franka Pick-Insert | β | β | β | β | β | β | β | β | β | 631 |
| KAIST Nonprehensile | β | β | β | β | β | β | β | β | β | 201 |
| NYU Franka Play | β | β | β | β | β | β | β | β | β | 456 |
| TOTO | β | β | β | β | β | β | β | β | β | 1003 |
| UTokyo xArm PickPlace | β | β | β | β | β | β | β | β | β | 102 |
| UCSD Kitchen | β | β | β | β | β | β | β | β | β | 150 |
| Austin VIOLA | β | β | β | β | β | β | β | β | β | 150 |
| Bridge | β | β | β | β | β | β | β | β | β | 38935 |
| RT-1 Robot Action | β | β | β | β | β | β | β | β | 87212 | |
| Jaco Play | β | β | β | β | β | β | β | β | β | 1084 |
| Language Table | β | β | β | β | β | β | β | β | 442226 |
π¦ How to Use (Example script to extract frames and robot states)
#!/usr/bin/env python3
import csv
from pathlib import Path
import torch
from PIL import Image
from lerobot.datasets.lerobot_dataset import LeRobotDataset
# --- change me ---
n = 2 # 0-based episode index to extract
REPO_ID = "oxe-aug/jaco_play_test_0_108"
ROBOT_ENTITY = "google_robot"
KEYS = {
"image": f"observation.images.{ROBOT_ENTITY}",
"state": [
f"observation.{ROBOT_ENTITY}.{field}"
for field in (
"base_orientation",
"base_position",
"ee_error",
"ee_pose",
"joints",
)
],
}
OUT = Path(f"./episode_{n:06d}_frames")
ds = LeRobotDataset(REPO_ID, video_backend="pyav") # simple video backend
ep_key = "episode_index"
# find n-th episode bounds by linear scan (no checks)
i = 0
k = -1
cur = None
while i < len(ds):
s = ds[i]
if cur is None or s[ep_key] != cur:
k += 1
cur = s[ep_key]
if k == n:
start = i
# advance to end of this episode
j = i
while j < len(ds) and ds[j][ep_key] == cur:
j += 1
end = j
break
i += 1
OUT.mkdir(parents=True, exist_ok=True)
def to_uint8_hwc(x: torch.Tensor) -> torch.Tensor:
return (
x.detach()
.mul(255.0)
.clamp(0, 255)
.round()
.to(torch.uint8)
.permute(1, 2, 0)
.contiguous()
.cpu()
)
csv_path = OUT / f"ep{n:06d}_{ROBOT_ENTITY}_state.csv"
def _flat_list(t):
return t.detach().cpu().flatten().tolist()
first_sample = ds[start]
# build header (discover lengths from the first frame in this episode)
header = ["frame"]
for k in KEYS["state"]:
vals = _flat_list(first_sample[k])
header += [f"{k}[{i}]" for i in range(len(vals))]
# save every frame in [start, end) while writing CSV rows
with open(csv_path, "w", newline="") as fcsv:
writer = csv.writer(fcsv)
writer.writerow(header)
frame = 0
use_first_sample = True
for idx in range(start, end):
if use_first_sample:
sample = first_sample
use_first_sample = False
else:
sample = ds[idx]
img = sample[KEYS["image"]]
im = to_uint8_hwc(img)
Image.fromarray(im.numpy()).save(OUT / f"ep{n:06d}_f{frame:06d}.png")
row = [frame]
for k in KEYS["state"]:
row += _flat_list(sample[k])
writer.writerow(row)
frame += 1
total_frames = frame
print(f"Saved {total_frames} PNGs to {OUT.resolve()}")
print(f"Saved {ROBOT_ENTITY} state CSV -> {csv_path.resolve()}")
π Updates
- {{2025-11}} β Released All 16 augmented datasets
π Citation
If you use OXE-AUG datasets or tools, please cite:
@misc{
ji2025oxeaug,
title = {OXE-Aug: A Large-Scale Robot Augmentation of OXE for Scaling Cross-Embodiment Policy Learning},
author = {Ji, Guanhua and Polavaram, Harsha and Chen, Lawrence Yunliang and Bajamahal, Sandeep and Ma, Zehan and Adebola, Simeon and Xu, Chenfeng and Goldberg, Ken},
year = {2025},
note = {Manuscript}
}
Also cite upstream datasets you rely on (see per-shard READMEs for references).
πͺͺ License & Responsible Use
- Datasets: CC BY 4.0 (attribution required; state your modifications in derivatives).
- Code: Apache-2.0 / MIT (match your repo choice).
- Responsible Use: No personal data; research/robotics use; do not deploy in unlawful or harmful contexts.
π€ Contribute & Contact
- Contribute new shards or fixes via Issues/PRs on the corresponding dataset repos.
- Collaboration, permissions, media: jgh1013@seas.upenn.edu
- Organization home: https://huggingface.co/oxe-aug
models
0
None public yet
datasets
162
oxe-auge/viola_train_augmented
Viewer
β’
Updated
β’
68.9k
β’
25
oxe-auge/viola_test_augmented
Viewer
β’
Updated
β’
7.41k
β’
29
oxe-auge/utokyo_xarm_pick_and_place_val_augmented
Viewer
β’
Updated
β’
701
β’
32
oxe-auge/utokyo_xarm_pick_and_place_train_augmented
Viewer
β’
Updated
β’
6.79k
β’
33
oxe-auge/utaustin_mutex_augmented
Viewer
β’
Updated
β’
362k
β’
39
oxe-auge/ucsd_kitchen_dataset_augmented
Viewer
β’
Updated
β’
3.97k
β’
31
oxe-auge/toto_train_800_902_augmented
Viewer
β’
Updated
β’
31.8k
β’
28
oxe-auge/toto_train_700_800_augmented
Viewer
β’
Updated
β’
31.1k
β’
82
oxe-auge/toto_train_600_700_augmented
Viewer
β’
Updated
β’
34.5k
β’
53
oxe-auge/toto_train_500_600_augmented
Viewer
β’
Updated
β’
34.4k
β’
88