Institute for Interdisciplinary Information Sciences (IIIS)

home>Events>Upcoming Events>Institute Seminars

A journey of building SOTA VLM and beyond

Speaker: Yao Lu NVidia
Time: 2024-09-18 09:00-2024-09-18 10:00
Venue: 腾讯会议：594-875-084

Abstract:

Visual Language Models (VLMs) have gone through significant development in the past couple of years. The gap between open source models and closed source models is decreasing while there remains open research questions. In this talk, we will share the journey of how we build VILA towards a SOTA VLM as well as some latest research on top of VILA, such as how to self augment VLM (VILA^2), how to enable long context (LongVILA) and how to combine generation and understanding (VILA-U).

Short Bio:

Yao Lu is a principal research scientist at NVidia, leading VILA project. Prior to that, Yao was a staff research manager at Google Deepmind where he led the development of SayCan, RT-1 and RT-2. His research focuses on reinforcement learning, imitation learning, and VLM. He has been publishing at top ML and Robotics conferences including CVPR, ICML, ICLR, NeurIPS, ICRA, CORL, RSS, etc. Yao received his PhD. at University of California, San Diego.