Login [Center] Logout Join Us Guidelines  I  中文  I  CQI

Converge or Diverge? A Story of Adam

Speaker: Yushun Zhang The Chinese University of Hong Kong
Time: 2023-09-28 09:00-2023-09-28 10:00
Venue: Tencent: 952-254-027

Abstract:

Adam is one of the most popular algorithms in deep learning, used in many important applications including ChatGPT. Despite the popularity, the theoretical properties of Adam were largely unknown, and how to tune Adam was not clear. The ICLR 2018 Best paper Reddi et al. (2018) pointed out Adam can diverge , and since then many variants of Adam were proposed. However, vanilla Adam remains exceptionally popular and it works well in practice. Why is there a gap between theory and practice? We point out there is an important mismatch between the settings of theory and practice: Reddi et al. (2018) pick the problem after picking the hyperparameters of Adam, i.e., ( beta1, beta2); while practical applications often fix the problem first and then tune ( beta1, beta2). We conjecture for the latter practical setting, i.e. allowing tuning hyperparameters, Adam can converge. In this talk, we present our recent findings that confirm this conjecture. More specifically, we show that when beta2 is large enough and beta1

Short Bio:

Yushun Zhang is currently a 4-th year Ph.D student in School of Data Science at The Chinese University of Hong Kong, Shenzhen, China, working with Prof. Zhi-Quan (Tom) Luo. Previously, he did his undergraduate study in the Mathematical Department at Southern University of Science and Technology. His research interest lie in optimization, deep learning, and most recently, large language models. He aims to solve practical engineering problems in these areas.