D. Arthur, S. Vassilvitskii. k-means++: The Advantages of Careful Seeding. Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 2007.

Arthur and Vassilvitskii propose a novel initialization for standard $K$-means. The initialization step is summarized as follows: Given a set of data points $X \subset \mathbb{R}^D$ and a number of clusters $K$:

  1. Randomly choose the first center $\mu_1 \in X$.
  2. For $k = 2,\ldots,K$, choose center $\mu_k$ as $\mu_k = x_n \in X$ with probability proportional to:

$\frac{d(x_n)}{\sum_{x_{n'}} d(x_{n'})}$ with $d(x_n) = \min_{1 \leq k' < k} \{\|x_n - \mu_{k'}\|_2\}$

This type of initialization is commonly implemented in standard $K$-means implementations, for example in OpenCV.

What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.