In this talk, I will present recent progress on understanding deep neural networks by analyzing the trajectory of the gradient descent algorithm. Using this analysis technique, we are able to explain:
1) why gradient descent finds a global minimum of the training loss even though the objective function is highly non-convex, and
2) why a neural network can generalize even the number of parameters in the neural network is more than the number of training data.
Based on joint work with Sanjeev Arora, Wei Hu, Jason D. Lee, Haochuan Li, Zhiyuan Li, Barnabas Poczos, Aarti Singh, Liwei Wang, Ruosong Wang, Xiyu Zhai
Simon Shaolei Du is a Ph.D. student in the Machine Learning Department at the School of Computer Science, Carnegie Mellon University, advised by Professor Aarti Singh and Professor Barnabás Póczos. His research interests broadly include topics in theoretical machine learning and statistics, such as deep learning, matrix factorization, convex/non-convex optimization, transfer learning, reinforcement learning, non-parametric statistics, and robust statistics. In 2011, he earned his high school degree from The Experimental High School Attached to Beijing Normal University. In 2015, he obtained his B.S. in Engineering Math & Statistics and B.S. in Electrical Engineering & Computer Science from University of California, Berkeley. He has also spent time working at research labs of Microsoft and Facebook.