D. Arthur, S. Vassilvitskii. k-means++: The Advantages of Careful Seeding. Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 2007.

Arthur and Vassilvitskii propose a novel initialization for standard $K$-means. The initialization step is summarized as follows: Given a set of data points $X \subset \mathbb{R}^D$ and a number of clusters $K$:

  1. Randomly choose the first center $\mu_1 \in X$.
  2. For $k = 2,\ldots,K$, choose center $\mu_k$ as $\mu_k = x_n \in X$ with probability proportional to:

$\frac{d(x_n)}{\sum_{x_{n'}} d(x_{n'})}$ with $d(x_n) = \min_{1 \leq k' < k} \{\|x_n - \mu_{k'}\|_2\}$

This type of initialization is commonly implemented in standard $K$-means implementations, for example in OpenCV.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: