GitHub - philfung/awesome-reliable-robotics: Robotics research demonstrating reliability and robustness in the real world (continuously updated)

A curated collection of robotics papers focused on real-world reliability and robustness. Originally a personal reference, I'm sharing this list in hopes it helps others.

Prerequisite: must include real-world results.

Contributions are welcome!

Name	Date	Real World Success Rate	Categories	Code	Paper	Project	Organization(s)	Key Insight	Architecture
ARM: Advantage Reward Modeling for Long-Horizon Manipulation	04/2026	99.4% success rate on long-horizon towel folding.	`Rewards`	Code	Paper	Project	LimX Dynamics	Shifts from absolute progress to estimating relative advantage (interval gain) via a lightweight tri-state labeling strategy, naturally accommodating regressive and recovery behaviors.	1. MIMO Temporal Advantage Transformer predicts the entire advantage sequence in a single forward pass. 2. Advantage-Weighted Behavior Cloning adaptively reweights action chunks based on interval gains.
GEN-1: Scaling Embodied Foundation Models to Mastery	04/2026	99% average success rate on tasks where prior models achieved 64%. Examples include folding t-shirts 86 times, folding boxes 200 times, and packing phones. Operates at ~3x the speed of prior SOTA.	`VLA`, `Foundation Models`	No Code : (		Article	Generalist	Scaling embodied foundation models trained primarily on human data crosses the mastery threshold (reliability, speed, and improvisation) for real-world tasks, requiring only ~1 hour of robot data per task.	1. Large multimodal model trained from scratch on 500,000+ hours of physical interaction data from wearable devices. 2. Task adaptation using ~1 hour of robot data via RL from experience and multimodal human guidance. 3. Real-time inference using Harmonic Reasoning and new forms of paged attention.
RL Token: Bootstrapping Online RL with Vision-Language-Action Models	03/2026	Across four challenging manipulation tasks (screwdriver, zip tie, Ethernet, charger), RLT speeds up execution by up to 3× and substantially improves success rates (e.g., from 20% to 65% for screw-insertion). Full-task success rates improved by 40% on screwdriver and 60% on zip tie. Surpasses human teleoperation speed on the Ethernet task.	`Online RL`, `VLA`	No Code : (	Paper	Project	Physical Intelligence	Isolating a task-relevant "RL token" from VLA embeddings enables stable, real-time online RL that can surpass human teleoperation speed in under an hour of training.	1. Frozen VLA ($\pi_0$) base perception. 2. Encoder-Decoder Transformer extracts compressed RL Token from internal VLA embeddings. 3. Lightweight actor-critic MLP heads. 4. Online RL fine-tunes actor-critic heads directly on robot. 5. Learned actor residuals refine/anchor the VLA's base actions.
From Prior to Pro: Efficient Skill Mastery via Distribution Contractive RL Finetuning (DICE-RL)	03/2026	Belt Assembly: 93.3% SR (from 56.7%), Light Bulb Insertion: 90% SR (from 56.7%), Gear Insertion: 90% SR (from 46.7%).	`Online RL`, `Skill Mastery`	Sim Code	Paper	Project	Stanford University	Treats RL as a "distribution contraction" operator that sharpens pre-trained generative policies by amplifying successful behaviors through stable, sample-efficient online feedback.	1. Frozen diffusion or flow-based BC prior. 2. Lightweight residual actor-critic MLP heads. 3. Selective behavior regularization for stable fine-tuning. 4. Value-guided action selection to amplify high-reward behaviors.
DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation	03/2026	95% success on Tissue Extraction, 65% on Plush Toy Grasping. Outperforms baseline by 25% on average.	`VLA`, `Human-in-the-loop`, `Dexterous Manipulation`	No Code : (	Paper	Project	Shanghai Jiao Tong University, CASIA, Shanghai AI Laboratory	First arm-hand human-in-the-loop framework for dexterous VLAs. Features an intervention-aware sampling and weighting strategy that prioritizes corrective segments during post-training to accelerate convergence and mitigate covariate shift.	1. Unified arm-hand teleoperation interface for coordinated human intervention. 2. Intervention-aware data sampling prioritizes corrective segments. 3. Post-training pipeline combines online HiL and offline data. 4. Weighted training mechanism emphasizes real-time expert corrections. 5. Two-stage hand joint retargeting for precise dexterous control.
Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models	02/2026	Average zero-shot success with Picking, Opening, and Closing tasks across 4 robot arms shown below.	`Representation Learning`	Code	Paper	Project	NYU, UC Berkeley, UCLA, Hello Robot, Ai2, University of Waterloo	Replaces abstract language conditioning with precise 3D "contact anchors" to guide actions, enabling robust zero-shot generalization across environments and robot embodiments with minimal data.	1. Modular utility models factorized by task (pick, open, close). 2. Hindsight contact labeling identifies 3D coordinates during demonstration processing. 3. Contact prompting during inference via manual or VLM-generated visual anchors. 4. EgoGym simulation benchmark for rapid "real-to-sim" failure mode refinement.
TwinRL-VLA: Digital Twin-Driven Reinforcement Learning for Real-World Robotic Manipulation	02/2026	100% success rate across four tasks (Pick-and-Place, Insert-Hexagon-Block, Insert-Triple-Column-Block, Erase-Whiteboard). Converges in ~20 minutes with 30% speedup over prior methods (ConRFT: 77.2%, HiL-SERL: 71.25%).	`Online RL`, `VLA`, `Digital Twin`	Code(unofficial by Yurong Jiang)	Paper	Project	Peking University, Simplexity Robotics, Tsinghua University, HKUST	A digital twin–real-world collaborative framework that expands the exploration space in simulation to broaden the trajectory distribution and accelerate online RL via sim-to-real guidance.	1. High-fidelity digital twins reconstructed from smartphone-captured real-world scenes. 2. Exploration space expansion in simulation to broaden the data distribution beyond SFT. 3. Sim-to-real guided exploration strategy to accelerate online fine-tuning. 4. Targeted human-in-the-loop interventions informed by failures identified in the twin.
LingBot-VA: Causal World Modeling for Robot Control	01/2026	Real-world Success Rate (SR) / Progress Score (PS): Make Breakfast 75% SR/97% PS, Pick Screws 70% SR/82.5% PS, Fold Clothes 35% SR/48.8% PS, Unpack Delivery 65% SR/84.5% PS, Insert Tubes 40% SR/85.8% PS, Fold Pants 70% SR/76.7% PS. Achieves >20% improvement over π0.5 on challenging tasks with only 50 demos.	`World Models`	Code	Paper	Project	Ant Group/Alibaba	Causal world modeling framework that unifies visual dynamics and action inference by interleaving video/action tokens in a single sequence, allowing the robot to "imagine" future outcomes.	1. Autoregressive diffusion framework (5.3B params) using flow matching. 2. Mixture-of-Transformers (MoT) architecture for shared latent space representation. 3. Unified autoregressive sequence of interleaved video and action tokens. 4. Asynchronous inference pipeline with KV Cache for real-time motor execution.
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning	01/2026	93.6% average success on challenging real-world ALOHA bimanual manipulation tasks. With model-based planning, achieves 12.5% higher task completion rate on challenging real-world tasks.	`World Models`	Code	Paper	Project	NVIDIA, Stanford University	Leverages pre-trained video diffusion models to capture physical laws, using "latent frame injection" to unify proprioception, actions, and rewards within the video generation process.	1. Large latent video diffusion base (Cosmos-Predict2-2B). 2. Latent frame injection spatially broadcasts non-image data into the diffusion sequence. 3. Unified fine-tuning for joint action, future state, and value prediction. 4. Model-based planning via Best-of-N sampling using the internal world model.
Does learning from experience benefit small AI robotics models?	12/2025	4/5 when training simple ACT on imitation + corrections only.	`Imitation Learning`	No Code : (		Article		Proves that even small task-specific models (like ACT) benefit significantly from RL post-training by learning to distinguish between "good" and "bad" actions, essential for recovery behaviors.	1. Small, high-frequency base architecture (e.g., ACT) optimized for low-latency control. 2. RL-based post-training using advantage-guided updates. 3. Specialization on focused tasks to avoid the data/compute burden of large VLAs. 4. Focus on learning recovery motion speed and efficiency beyond initial BC demonstrations.
π*0.6 : a VLA That Learns From Experience	11/2025	The system ran for 13 hours straight making espresso drinks and over two hours folding novel laundry items without interruptions. Success Rates: Laundry (t-shirts & shorts) ~95%, Laundry (Diverse Hardest Items) ~70%, Make Espresso ~90%, Box Assembly ~90%	`VLA`	No Code : (	Paper	Project	Physical Intelligence	RECAP (RL with Experience and Corrections via Advantage-conditioned Policies) enables VLAs to learn from their own successes and failures. By using an advantage value function to evaluate progress, the model extracts optimal behaviors even from suboptimal data, doubling throughput on complex tasks.	1. Gemma 3 4B VLM backbone with an 860M-parameter "action expert" module for high-frequency control. 2. RECAP training loop incorporates autonomous experience and teleoperated expert corrections. 3. Advantage-conditioned policy refinement uses a learned value function to prioritize high-reward actions. 4. Three-stage pipeline: Offline RL pre-training → Autonomous experience collection → RECAP policy optimization.
RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning	10/2025	100% success across 7 tasks. 92.5% average zero-shot success on 3 tasks (without any retraining or fine-tuning), 86.7% average few-shot success on 3 tasks	`Online RL`	Code(unofficial by Yanjie Ze)	Paper	Project	Shanghai Qizi, Shanghai Jiao Tong, HKU, UNC Chapel Hill	Unifies IL and RL under a single PPO-style objective within the diffusion denoising process, achieving 100% success across diverse tasks by pushing performance beyond human demonstrations.	1. Three-stage pipeline: IL Pre-training → Iterative Offline RL (gated by OPE) → Online RL Fine-tuning. 2. Lightweight consistency distillation procedure to compress multi-step diffusion into a one-step controller. 3. Self-supervised visual encoder for stable, drift-resistant representations during RL. 4. Diffusion-based visuomotor backbone supporting single-action and action-chunking control.
APO: Human-assisted Robotic Policy Refinement via Action Preference Optimization	10/2025	Improvement on success rates of Dagger, TPO, etc on in-distribution, as well as when position, background, or texture are disrupted.	`Human-in-the-loop`	Code	Paper	Project	ByteDance
HI-ORS: Human-in-the-loop Online Rejection Sampling for Robotic Manipulation	10/2025	Improved RW Success Rates vs vanilla BC, HIL-SERL, Q-Chunking.	`Human-in-the-loop`	Code	Paper	Project	TenCent
ARMADA/FLOAT: Autonomous Online Failure Detection and Human Shared Control Empower Scalable Real-world Deployment and Adaptation	10/2025	failure detector FLOAT achieves nearly 95% accuracy on average, surpassing prior SOTA failure detection approaches by > 20%.		Code	Paper	Project	Shanghai Jiao Tong University
SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation	09/2025	83% success on folding T-shirts (flattened), 67% success on folding T-shirts (crumpled). Surpasses vanilla BC (8% and 0%).	`Rewards`	Code (lerobot)	Paper	Project	Stanford, UC Berkeley, xdof.ai	Provides stable, grounded progress signals for long-horizon tasks by jointly predicting discrete task stages and fine-grained progress, avoiding brittle frame-index-based rewards.	1. Dual-head transformer architecture (Stage Estimator + Subtask Estimator) with a shared multimodal backbone. 2. Processes multimodal sequences (CLIP + proprioception) to capture temporal task dependencies. 3. Derives rewards from natural language subtask annotations for consistent progress estimation. 4. Integrated with Reward-Aligned Behavior Cloning (RA-BC) for high-precision deformable manipulation.
Dual-Actor Fine-Tuning of VLA Models: A Talk-and-Tweak Human-in-the-Loop Approach	09/2025	100% success across three tasks within 101 minutes of online fine-tuning. For long-horizon tasks, it sustains a 50% success rate over 12 consecutive operations.	`VLA`	No Code : (	Paper	Project	Zhejiang & others	Introduces a "talk-and-tweak" scheme that translates physical human corrections into semantically grounded language commands, enabling efficient and interpretable policy adjustments.	1. Dual-actor system: Primary Actor for multi-task base actions + Refinement Actor for fine-grained adjustments. 2. Refinement Actor operates in the policy's latent noise space, guided by natural-language refinement commands. 3. "Talk-and-Tweak" interface converts real-time tweaks into semantic "talk" instructions. 4. RL-based adaptation loop updates the refinement actor to align with human expert corrections.
WSRL: Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data	07/2025	100% success rate on Franka peg insertion task in 18 minutes, SERL fails (0/20) even with 50 minutes.	`Online RL`	Code	Paper	Project	UC Berkeley	Demonstrates that retaining offline data during fine-tuning is unnecessary if a "warm-start" phase is used to recalibrate the Q-function, preventing catastrophic forgetting while significantly reducing computational costs.	1. Policy and value function initialization via offline RL pre-training. 2. Warmup phase collects online rollouts using the frozen offline policy. 3. Recalibration of the offline Q-function to the online distribution during warmup. 4. Standard online RL fine-tuning with high update-to-data ratios for accelerated learning.
Dyna Robotics	07/2025	99.9% success rate in folding towels for 8 hours/day over 3 days (dropped 1 towel on day 2). No intervention.		No Code : (		Project	Dyna Robotics
Figure (Helix)	06/2025	~95% accuracy at correctly orienting barcodes. 4.05 seconds per package.		No Code : (		Project	Figure	Adds memory for more robust, long-term tasks and force feedback for improved grip.
RSS 2025 Workshop: Human-in-the-Loop Robot Learning: Teaching, Correcting, and Adapting	06/2025	various results		No Code : (		Project	various universities
Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections	06/2025	book-flipping success rate of 100% (60% improvement) and belt assembly success of 70% (50% improvement)	`Human-in-the-loop`	Code	Paper	Project	Stanford
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations	05/2025	an hour of real-world RL improves success rate from 12% to 68%, vs 8% to 10% with VLC	`Rewards`	Code	Paper	Project	U Wash
Dyna Robotics DYNA-1 Model	04/2025	99.4% success rate in folding napkins over 24 hours. No intervention.		No Code : (		Project	Dyna Robotics
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy	02/2025	96.3% avg success rate across tasks, compared to 31.9% w/ HIL-SERL	`VLA`	Code	Paper		Chinese Academy of Sciences	Integrates RL with consistency distillation to simultaneously boost VLA reliability and inference speed, enabling single-step action generation that is 5-10x faster than iterative diffusion.	1. Offline Calibrated Q-Learning (Cal-QL) stabilizes value estimation on small demo sets. 2. Human-in-the-loop online RL (HIL-ConRFT) enables rapid real-world adaptation. 3. Unified consistency-based training objective used across both offline and online stages. 4. Consistency distillation transforms iterative diffusion into high-frequency, single-step inference.
HIL-SERL: Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning	10/2024	100% success rate on a variety of tasks	`Online RL`	Code(official, (sim, lerobot)	Paper	Project	UC Berkeley	Integrates real-time human corrections with sample-efficient RL to master complex, vision-based manipulation directly in the real world, achieving 100% success on diverse tasks in under 3 hours of training.	1. Asynchronous Actor, Learner, and Replay Buffer processes. 2. Real-time human intervention via SpaceMouse for reactive corrections during exploration. 3. Learned binary reward classifier trained on teleoperated positive/negative samples. 4. Off-policy RL (SAC) optimized for high-frequency real-world interaction.
RLIF: INTERACTIVE IMITATION LEARNING AS REINFORCEMENT LEARNING	03/2024	95% success rate in cloth unfolding within 7 rounds, 100% rate success in peg insertion within 6 rounds	`Imitation Learning`	Code	Paper	Project	UC Berkeley
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning	01/2024	100% success on PCB insertion, cable routing, object relocation	`Online RL`	Code	Paper	Project	UC Berkeley

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
images		images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!