Interactive 3D scenes are increasingly vital for embodied intelligence, yet existing datasets remain limited due to the labor-intensive process of annotating part segmentation, kinematic types, and motion trajectories. We present REACT3D, a scalable zero-shot framework that converts static 3D scenes into simulation-ready interactive replicas with consistent geometry, enabling direct use in diverse downstream tasks.
Clone the repository and install necessary dependencies:
git clone https://github.com/troyehuang/REACT3D.git --recursive
cd REACT3D
conda create -n react3d python==3.10
conda activate react3d
# change for your machine
export TORCH_CUDA_ARCH_LIST="7.5 8.0 8.6 8.9"
pip install ninja
pip install torch==2.1.2+cu121 torchvision==0.16.2+cu121 --extra-index-url https://download.pytorch.org/whl/cu121
# groundingDINO
set BUILD_WITH_CUDA=True
set CUDA_HOME=<your_path>
set AM_I_DOCKER=False
cd grounded_sam/GroundingDINO
python setup.py build
python setup.py install
# scene2part
cd ../..
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch --no-build-isolation
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable" --no-build-isolation
pip install kaolin -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.1.2_cu121.html
pip install git+https://github.com/NVlabs/nvdiffrast.git --no-build-isolation
pip install -r requirements.txt
# part2interactive
pip install 'git+https://github.com/facebookresearch/detectron2.git' --no-build-isolation
cd opdformer/mask2former/modeling/pixel_decoder/ops
python setup.py build install
We updated the code to use Qwen3.5 instead of LLaVA. Install the environment for Qwen3.5:
conda create -n react3d_qwen python==3.10
conda activate react3d_qwen
pip install transformersInstall checkpoints:
cd REACT3D
# ram++ checkpoint
cd ram++
wget --no-check-certificate https://huggingface.co/xinyu1205/recognize-anything-plus-model/resolve/main/ram_plus_swin_large_14m.pth
# grounded sam checkpoint
cd ../grounded_sam
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
# opdm checkpoint
cd ../part2interactive
wget --no-check-certificate https://huggingface.co/3dlg-hcvc/opdmulti-motion-state-rgb-model/resolve/main/pytorch_model.pth -O {REACT3D_dir}/part2interactive/opdm_rgb.pthWe evaluate our work on ScanNet++ augmented by Articulated3D and MultiScan datasets. For convenience, we provide an example input scene and the corresponding output. The interactive 3D demo for this output is shown in our project page.
Note: For computational efficiency, the provided example scene has been downsampled: the image resolution is reduced by half, and the frame sequence is sub-sampled by extracting every 5th frame. Please be aware that these factors may impact the final quality of the reconstructed results.
To use custom data, please follow the structure of the example_input_scene to process your own scenes. Make sure the your input data format is the same as the example data.
The example_input_scene folder is organized as follows:
example_input_scene
|---images_2
|---mesh_aligned_0.05.ply
|---pose_intrinsic_imu.json
|---depthTo run the script on a specific scene, use:
cd REACT3D
# remember to change the paths in each script
cd scene2part
bash scene2part.sh
cd ../part2interactive
bash part2interactive.sh
cd ../texture
bash generate_texture.sh
# generate URDF files and ROS files
cd ../simulation_ready
bash simulation_ready.sh
Texture generation is only needed if the input mesh lacks UVs and the simulator lacks vertex color support. It may be slow for the remain scene due to the complexity of UV parameterization (via xatlas).
cd REACT3D/visualization
# if you want to visualize the output directly after running
# part2interactive.sh (without running generate_texture.sh)
# Viser visualization where you can interact with the scene
python vis_interactive.py --scene_dir <path_to_scene_dir>
# basic Open3D visualization with joint arrows; this doesn't support manipulation
python vis_result_part.py --scene_dir <path_to_scene_dir>
# if you want to visualize the simulation ready output by Viser
# after running generate_texture.sh and simulation_ready.sh
python vis_interactive_urdf.py --scene_dir <path_to_scene_dir>
# or you can import simulation_ready.sh results into ROS to visualize it
- Project Page online
- Initial code released
- Example input scene and output data
- Performance optimization for Scene2Part module
- Upgrade from LLaVA to Qwen3.5
- Viser interface
- Visualization guide
- Texture
Our work is based on OPDMulti and DRAWER. We thank the authors for their great work and open-sourcing the code.
If you find our work useful, please consider citing:
@ARTICLE{11434845,
author={Huang, Zhao and Sun, Boyang and Delitzas, Alexandros and Chen, Jiaqi and Pollefeys, Marc},
journal={IEEE Robotics and Automation Letters},
title={REACT3D: Recovering Articulations for Interactive Physical 3D Scenes},
year={2026},
volume={11},
number={5},
pages={5954-5961},
keywords={Three-dimensional displays;Joints;Geometry;Image reconstruction;Estimation;Solid modeling;Point cloud compression;Foundation models;Biological system modeling;Accuracy;Semantic scene understanding;object detection;segmentation and categorization;RGB-D perception},
doi={10.1109/LRA.2026.3674028}
}
![[Teaser Figure]](/troyehuang/REACT3D/raw/main/assets/teaser.png)