Speaker: Yunbei Xu MIT
Time: 2024-01-03 10:30-2024-01-03 11:30
Venue: FIT 1-222
Abstract:
We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization approach to generate “algorithmic beliefs” at each round, and use Bayesian posteriors to make decisions. The optimization objective to create “algorithmic beliefs,” which we term “Algorithmic Information Ratio,” represents an intrinsic complexity measure that effectively characterizes the frequentist regret of any algorithm. This is the first approach to make Bayesian-type algorithms prior-free and applicable to adversarial settings, in a generic and optimal manner. Moreover, the algorithms are simple and often efficient to implement. As a major application, we present a novel algorithm for multi- armed bandits that achieves the “best-of-all-worlds” empirical performance in the stochastic, adversarial, and non-stationary environments. And we illustrate how these principles can be used in linear bandits, bandit convex optimization, and reinforcement learning.
Short Bio:
Yunbei Xu is currently a postdoctoral associate at MIT under the mentorship of Alexander Rakhlin. He earned his Ph.D. from Columbia University in 2023, under the guidance of Assaf Zeevi, and holds a B.Sc. from Peking University in 2018. Yunbei has a broad interest in the mathematics of AI and physics. His contributions have been recognized with the ICML Outstanding Paper Award, First Place in the INFORMS George Nicholson Student Paper Competition, and twice as a Finalist for the Applied Probability Society Best Student Paper Award.