The k-means algorithm is widely used for clustering data. No general proof of convergence assuming reasonable conditions is known. Here, we show that the k-means algorithm converges to the true cluster centers provided
(i) Each (most) data points are closer to their own cluster center than the center of any other cluster and
(ii) we start with centers close (in a well-defined sense) to the true centers.
(iii) We also show that simple Principal Component Analysis finds us starting centers satisfying (ii).
(iv) We show that in all stochastic generative models (like Gaussian mixtures and planted partition models) so far studied, our proximity condition (i) holds so that our result subsumes earlier results.
Joint work with Amit Kumar, IIT, Delhi.
Prof. Ravi Kannan is the Principal Researcher in Algorithms Research Group of Microsoft Research Lab, India. He is also an Adjunct Professor in the Department of Computer Science and Automation, Indian Institute of Science. His research interests include Algorithms, Theoretical Computer Science and Discrete Mathematics as well as Optimization. His work has mainly focused on efficient algorithms for problems of a mathematical (often geometric) flavor that arise in Computer Science. He has worked on algorithms for Integer Programming and the Geometry of Numbers, Random Walks in n-space, Randomized Algorithms for Linear Algebra and Learning Algorithms for convex sets.
He was awarded the Fulkerson Prize in Discrete Mathematics in 1991 by the Mathematical programming Society and the American Mathematical Society for his work on estimating the volume of convex sets. He was awarded the Distinguished Alumnus award of the Indian Institute of Technology, Bombay in 1999. He was a plenary speaker at the 1997 IEEE Symposium on the Foundations of Computer Science.
He has served on the editorial board of several journals and on program committees. He was the Chair of the Program Committee for the second Integer Programming and Combinatorial Optimization conference in 1988 and the Chair of the Fulkerson Prize Committee in 2000.