Toward Self-Sustaining Spatial AI: Agent, World, and Data

演讲人: Yiming Li New York University
时间: 2024-12-25 10:00-2024-12-25 11:00
地点:FIT 1-222
内容:

Spatial intelligence can enable robots to perceive, reason, and act within three-dimensional space and time. Traditional methods heavily rely on human input, such as manual annotations and hand-crafted simulators, to teach robots to perceive and navigate. However, they suffer from high cost, low efficiency, limited scalability, and a lack of sensor realism. In this talk, I will present my recent efforts to replicate human-like spatial cognition, empowering vision-only robots to efficiently learn and adapt within three-dimensional space and time using intrinsic objectives instead of external supervision. I will mainly introduce: (a) autonomous world digitization, where a visual agent reconstructs a photorealistic digital world from real-world RGB images via self-supervision, and (b) digital world as data engine, where the digital world synthesizes many more visual data to further enhance the spatial cognition of the agent. I will also briefly introduce our dataset and benchmark efforts for self-driving and robotics in collaboration with industry. My long-term vision is that embodied agent, sensory data, and digital world can form a virtuous cycle as a self-sustaining spatial AI system, capable of autonomously maintaining and improving itself with minimal external resources—paving the way toward autonomous intelligence.

个人简介:

Yiming Li is a Dean’s PhD Fellow at New York University (NYU), where he is advised by Institute Associate Professor Chen Feng. He is also a Graduate Fellow at NVIDIA Research advised by Prof. Marco Pavone at Stanford University. He was a research intern at NVIDIA Applied AV Research in 2023 and NVIDIA AI Research in 2022, working with Prof. Anima Anandkumar, Dr. Jose M. Alvarez, Dr. Zhiding Yu, Prof. Yue Wang, Dr. Zan Gojcic, and Prof. Sanja Fidler. His research aims to advance spatial intelligence for robotics and embodied agents, with a focus on replicating human-like spatial cognition with minimal human supervision. His work has received 2,500+ citations and has been published in top-tier venues in robotics, computer vision, and machine learning, such as CVPR, ICCV, ECCV, NeurIPS, RSS, CoRL, and RA-L, with multiple first-author papers selected for NeurIPS Spotlight, CVPR Highlight, and ICCV Oral presentations. He received the prestigious NVIDIA Fellowship (2024-2025) and the NYU Future Leader Fellowship (2021-2023).