清华大学交叉信息研究院

A journey of building SOTA VLM and beyond

演讲人： Yao Lu NVidia
时间： 2024-09-18 09:00-2024-09-18 10:00
地点：腾讯会议：594-875-084
内容：

Visual Language Models (VLMs) have gone through significant development in the past couple of years. The gap between open source models and closed source models is decreasing while there remains open research questions. In this talk, we will share the journey of how we build VILA towards a SOTA VLM as well as some latest research on top of VILA, such as how to self augment VLM (VILA^2), how to enable long context (LongVILA) and how to combine generation and understanding (VILA-U).

个人简介:

Yao Lu is a principal research scientist at NVidia, leading VILA project. Prior to that, Yao was a staff research manager at Google Deepmind where he led the development of SayCan, RT-1 and RT-2. His research focuses on reinforcement learning, imitation learning, and VLM. He has been publishing at top ML and Robotics conferences including CVPR, ICML, ICLR, NeurIPS, ICRA, CORL, RSS, etc. Yao received his PhD. at University of California, San Diego.