| Format | Read | Write | Notes |
|---|---|---|---|
| RLDS | ✅ | ✅ | Open X-Embodiment, OpenVLA, Octo |
| LeRobot v2 | ✅ | - | HuggingFace robotics |
| LeRobot v3 | ✅ | ✅ | Target format for all conversions |
| Zarr | ✅ | - | Diffusion Policy datasets |
| HDF5 | ✅ | - | robomimic, ACT/ALOHA datasets |
| ROS bags | ✅ | - | Raw robot recordings |
- Format auto-detection
- CLI (
forge inspect,convert,visualize,formats,hub) - YAML configuration files
- Parallel episode processing (
--workers N) - Unified visualizer
- HuggingFace Hub integration (
hf://URLs)
Status: Done
# Download and inspect
forge inspect hf://lerobot/pusht
# Download and convert
forge convert hf://openvla/modified_libero_rlds output/ --format lerobot-v3
# Search datasets
forge hub "robot manipulation"
forge hub --author lerobotFeatures:
-
hf://URL scheme for HuggingFace datasets - Automatic download and caching (
~/.cache/forge/datasets/) - Dataset search/discovery (
forge hubcommand) - Streaming support for large datasets (future)
Status: Done
# Convert LeRobot to RLDS
forge convert lerobot_dataset/ output/ --format rlds
# Round-trip conversion (RLDS → LeRobot v3 → RLDS)
forge convert original_rlds/ temp/ --format lerobot-v3
forge convert temp/ reconverted_rlds/ --format rldsFeatures:
- Write RLDS format (TFRecord)
- Generate proper episode metadata (dataset_info.json, features.json)
- Support OXE-compatible schema
- Round-trip conversion verified
Why: Ensure converted datasets are correct and complete.
forge validate output/ --check-frames --check-videos- Frame count validation
- Video integrity checks
- Schema compatibility checks
- Compare source vs converted statistics
- Large-scale bimanual manipulation dataset
- Already converted to LeRobot:
IPEC-COMMUNITY/droid_lerobot
- 60k+ trajectories from WidowX robots
- Core component of Octo training
- Simulation benchmark used by OpenVLA and Pi0
- Pre-converted versions exist on HF Hub
- Lazy loading for datasets > RAM
- Chunked processing
- Resume interrupted conversions
forge stats dataset/ --plot
forge stats dataset/ --sample 100 --output stats.json- Action/state distributions (min, max, mean, std per dimension)
- Episode length histograms
- Coverage metrics (language, success labels, rewards)
- JSON export for programmatic access
- Matplotlib visualization with --plot flag
forge upload output/ --repo my-org/my-dataset- Image augmentations during conversion
- Action noise injection
- Temporal subsampling
- HDF5 reader (robomimic, ACT/ALOHA) ✅ Done
- MuJoCo dataset support
- Isaac Gym recordings
| Model | Training Format | Fine-tune Format | Forge Support |
|---|---|---|---|
| OpenVLA | RLDS | RLDS | ✅ Read |
| Octo | RLDS | RLDS | ✅ Read |
| Pi0/OpenPI | Proprietary | LeRobot v2 | ✅ Read/Write |
| ACT | HDF5 | HDF5 | ✅ Read |
| Diffusion Policy | Zarr | Zarr | ✅ Read |
- Open X-Embodiment - 1M+ trajectories, RLDS format
- OpenVLA - Uses RLDS, fine-tunes on RLDS
- Octo - Uses RLDS from OXE
- OpenPI (Pi0) - Uses LeRobot for fine-tuning
- LeRobot - HuggingFace robotics standard