Non-convex optimization is the predominant method of choice for training machine learning models in many applications. A fundamental question towards better understanding of non-convex methods is their generalization performance: how much data is needed to train a good model? In the first part of the talk, I will consider gradient-based methods in over-parameterized models such as matrix and tensor factorizations. This is motivated by the observation that neural networks are often trained with more parameters than number of observations in practice. Our results show that the generalization performance crucially depends on the initialization. Meanwhile, adding parameters helps optimization by avoiding bad local minima.
In the second part of the talk, I will show how generalization theory can inform model selection for tensor completion. This is the problem of predicting the missing entries of a tensor given partial observations. I will present theoretical guarantees for a family of quadratic tensor models, which have emerged from applications such as recommendation systems and knowledge base completion. On a variety of real-world data where only limited amount of observations are available, we found that quadratic tensor models outperform other traditional models.
Hongyang Zhang is a Ph.D. candidate in the CS Department at Stanford University, co-advised by Ashish Goel and Greg Valiant. His research interests lie in machine learning and algorithms, including topics such as neural networks, matrix and tensor factorizations, non-convex optimization and social network analysis. He is a recipient of the best paper at COLT'18.