Traditional multi-armed bandit models posit that the payoff distribution of each action (or "arm") is stationary over time, and hence that the goal of learning is to identify the arm with the highest expected payoff and choose that one forever after. However, in many applications the efficacy of an action depends on the amount of time that has elapsed since it was last performed. Examples arise in precision agriculture, online education, and music recommendations. In this talk we introduce a generalization of the multi-armed bandit problem that models such applications. In the course of analyzing algorithms for this problem, we will encounter some interesting combinatorial questions about coloring the integers subject to bounds on the sizes of subintervals that exclude a given color. This talk is based on joint work with Nicole Immorlica that appeared in FOCS 2018.
Bobby Kleinberg is a Professor of Computer Science at Cornell University. His research pertains to the design and analysis of algorithms and their applications to machine learning, economics, networking, and other areas. Prior to receiving his doctorate from MIT in 2005, Kleinberg spent three years at Akamai Technologies; he and his co-workers received the 2018 SIGCOMM Networking Systems Award for pioneering the world's first and largest Internet content distribution network. He is also the recipient of a Microsoft Research New Faculty Fellowship, an Alfred P. Sloan Foundation Fellowship, and an NSF CAREER Award.