Yushun Zhang The Chinese University of Hong Kong
时间： 2023-09-28 09:00-2023-09-28 10:00
Adam is one of the most popular algorithms in deep learning, used in many important applications including ChatGPT. Despite the popularity, the theoretical properties of Adam were largely unknown, and how to tune Adam was not clear. The ICLR 2018 Best paper Reddi et al. (2018) pointed out Adam can diverge , and since then many variants of Adam were proposed. However, vanilla Adam remains exceptionally popular and it works well in practice. Why is there a gap between theory and practice? We point out there is an important mismatch between the settings of theory and practice: Reddi et al. (2018) pick the problem after picking the hyperparameters of Adam, i.e., ( beta1, beta2); while practical applications often fix the problem first and then tune ( beta1, beta2). We conjecture for the latter practical setting, i.e. allowing tuning hyperparameters, Adam can converge. In this talk, we present our recent findings that confirm this conjecture. More specifically, we show that when beta2 is large enough and beta1 <sqrt beta2, vanilla Adam converges without any modification. Our results lead to suggestions on how to tune Adam hyperparameters: when Adam does not work well, we suggest tuning up beta2 and try beta1 < sqrt beta2. These results are confirmed by numerical experiments.
Yushun Zhang is currently a 4-th year Ph.D student in School of Data Science at The Chinese University of Hong Kong, Shenzhen, China, working with Prof. Zhi-Quan (Tom) Luo. Previously, he did his undergraduate study in the Mathematical Department at Southern University of Science and Technology. His research interest lie in optimization, deep learning, and most recently, large language models. He aims to solve practical engineering problems in these areas.