Machine learning has become one of the most exciting research areas in the world, with various applications. However, there exists a noticeable gap between theory and practice. On one hand, simple algorithm like stochastic gradient descent (SGD) works very well in practice, without satisfactory theoretical explanations. On the other hand, the algorithms from the theory community, although with solid guarantees, tend to be less efficient compared with the techniques widely used in practice, which are usually hand tuned or ad hoc based on intuition.
In this talk, I would like to discuss my effort to bridge theory and practice from two directions. The first direction is “practice to theory”, i.e., to explain and analyze the existing algorithms and empirical observations in machine learning. I will first briefly talk about how SGD escapes saddle points, and then present a two-phase convergence analysis of SGD for the two-layer neural network with ReLU activation.
The other direction is “theory to practice”, i.e., using deep theory tools to obtain new, better and practical algorithms. Along this direction, I will introduce our new algorithm Harmonica that uses Fourier analysis and compressed sensing for tuning hyperparameters. Harmonica supports parallel sampling and works well for tuning neural networks with 30+ hyperparameters.
Yang Yuan is a sixth year CS PhD candidate at Cornell University, advised by Professor Robert Kleinberg. He did his undergraduate study at Peking University (2008-2012). He was a visiting student at MIT/Microsoft New England (2014-2015), and Princeton University (2016 Fall). He works on the topics at the intersection of machine learning, theory and optimization.