Login [Center] Logout Join Us Guidelines  I  中文  I  CQI

讲座详情 演讲标题 Towards Efficient Multimodal Intelligence on End Devices

Speaker: Yuan Yao National University of Singapore
Time: 2024-09-19 15:00-2024-09-19 16:00
Venue: FIT 1-222

Abstract:

Multimodal Large Language Models (MLLMs) have fundamentally reshaped the landscape of AI. However, the huge sizes of current MLLMs prevent them from being practical for both research and industry. My recent research focuses on improving the knowledge density of MLLMs, building smaller and stronger models efficiently deployable on end devices. Based on the research, we build MiniCPM-V, a series of efficient end-side MLLMs. The latest MiniCPM-V 2.6, with 8B parameters, outperforms GPT-4V in three major capabilities, including single-image, multi-image, and video understanding. 

In this talk, I will introduce the key research supporting MiniCPM-V, from three major perspectives: (1) Efficient model architecture, which supports 1.8 million pixels of high-resolution image encoding, and facilitates efficient knowledge transfer to multi-image and video modeling; (2) Efficient training strategy, which enables multimodal interaction in over 30 languages at low costs; (3) High-quality data construction, which reduces multimodal hallucinations through human/AI feedback.

MiniCPM-V is well received by the community, ranking first on the HuggingFace Trending for a week among 700k models (The other top 3 models in the same period include Meta's Llama3 and Microsoft's Phi-3-vision). It also ranked #1 on GitHub Trending and Papers With Code Trending Research. Since its release in Feb. 2024, MiniCPM-V series has received over 11k stars on GitHub and 1.8 million downloads on open-source platforms. 

Short Bio:

Yuan Yao is a postdoctoral researcher at the National University of Singapore. His research is focused on efficient multimodal large language models for deep vision-language understanding, and he leads the MiniCPM-V series models. He received his Ph.D. from the Natural Language Processing Lab at Tsinghua University, and Bachelor's degree from Tsinghua University. His works have been selected for ICLR Spotlight, ECCV Oral, and Nature Communications Editors' Highlights. He received the Outstanding Doctoral Dissertation Award of Wu Wenjun AI Science and Technology Award, Intel China Academic Achievement Award, and Spotlight Recipient of the WAIC Yunfan Award.