The optimization and generalization of neural networks are still like mysteries. Simple gradient methods can find global minima of a highly non-convex loss function, and neural networks can generalize well despite being highly over-parameterized. In the first part of the talk, I will show the connection between ultra-wide neural networks and kernel methods with a particular kernel function: neural tangent kernel (NTK) and use this connection to explain the optimization and generalization behavior of neural networks. Furthermore, I will show how NTK can help us understand the benefit of certain layers in neural network architecture. In the second part of the talk, I will show NTK itself can be useful in practice. In several settings, including categorical data classification, graph classification, and few-shot learning, NTK enjoys the superior empirical performance.
Simon Shaolei Du is a postdoc at Institute for Advanced Study of Princeton, hosted by Sanjeev Arora. He completed his Ph.D. in Machine Learning at Carnegie Mellon University, where he was co-advised by Aarti Singh and Barnabás Póczos. Previously, He studied EECS and EMS at UC Berkeley. He have also spent time at Simons Institute and research labs of Facebook, Google and Microsoft. His research interests are broadly in machine learning such as deep learning, reinforcement learning, transfer learning, (non-)convex optimization, non-parametric estimation, robust statistics and matrix analysis. His research goal is to develop theoretically principled methods that improve practical performance.