Publications

Vision-Guided Quadrupedal Locomotion in the Wild with Multi-Modal Delay Randomization[in submission]

Published in IROS 2022, 2025

Dense image correspondence is central to many applications, such as visual odometry, 3D reconstruction, object association, and re-identification. Historically, dense correspondence has been tackled separately for wide-baseline scenarios and optical flow estimation, despite the common goal of matching content between two images. In this paper, we develop a Unified Flow & Matching model (UFM), which is trained on unified data for pixels that are co-visible in both source and target images. UFM uses a simple, generic transformer architecture that directly regresses the (u,v) flow . It is easier to train and more accurate for large flows compared to the typical coarse-to-find cost volumes in prior work. UFM is 28% more accurate than state-of-the-art flow methods (Unimatch), while also having 62% less error and 6.7x faster than dense wide-baseline matchers (RoMa). UFM is the first to demonstrate that unified training can outperform specialized approaches across both domains. This enables fast, general-purpose correspondence and opens new directions for multi-modal, long-range, and real-time correspondence tasks.

Recommended citation: Zhang, Yuchen, Nikhil Keetha, Chenwei Lyu, Bhuvan Jhamb, Yutian Chen, Yuheng Qiu, Jay Karhade et al. "UFM: A Simple Path towards Unified Dense Correspondence with Flow." arXiv preprint arXiv:2506.09278 (2025). https://uniflowmatch.github.io/

Vision-Guided Quadrupedal Locomotion in the Wild with Multi-Modal Delay Randomization

Published in IROS 2022, 2022

Developing robust vision-guided controllers for quadrupedal robots in complex environments with various obstacles, dynamical surroundings and uneven terrains is very challenging. While Reinforcement Learning (RL) provides a promising paradigm for agile locomotion skills with vision inputs in simulation, it is still very challenging to deploy the RL policy in the real world. Our key insight is that aside from the discrepancy in the observation domain gap between simulation and the real world, the latency from the control pipeline is also a major cause of the challenge. In this paper, we propose Multi-Modal Delay Randomization (MMDR) to address this issue when training with RL agents. Specifically, we randomize the selections for both the proprioceptive state and the visual observations in time, aiming to simulate the latency of the control system in the real world. We train the RL policy for end2end control in a physical simulator, and it can be directly deployed on the real A1 quadruped robot running in the wild. We evaluate our method in different outdoor environments with complex terrain and obstacles. We show the robot can smoothly maneuver at a high speed, avoiding the obstacles, and achieving significant improvement over the baselines.

Recommended citation: Chieko Imai, Minghao Zhang, Yuchen Zhang, Marcin Kierebinski, Ruihan Yang, Yuzhe Qin, & Xiaolong Wang (2022). Vision-Guided Quadrupedal Locomotion in the Wild with Multi-Modal Delay Randomization. In 2022 IEEE/RSJ international conference on intelligent robots and systems (IROS). https://arxiv.org/abs/2109.14549