Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

K-means clustering uniqueness of solution

Does the k-means clustering algorithm always yield the same solution? The initialization is supposed to be random, so does the clustering converge to the same result regardless of the initialization?

like image 318
Igor Ševo Avatar asked Jan 21 '14 13:01

Igor Ševo


People also ask

Does k-means give unique solution?

No, it doesn't.

What are some challenges with using k-means for clustering?

k-means has trouble clustering data where clusters are of varying sizes and density. To cluster such data, you need to generalize k-means as described in the Advantages section. Clustering outliers. Centroids can be dragged by outliers, or outliers might get their own cluster instead of being ignored.

How do you choose the best value for k clusters?

There is a popular method known as elbow method which is used to determine the optimal value of K to perform the K-Means Clustering Algorithm. The basic idea behind this method is that it plots the various values of cost with changing k. As the value of K increases, there will be fewer elements in the cluster.

Does k-means find global solution?

The algorithm does not guarantee convergence to the global optimum. The result may depend on the initial clusters. As the algorithm is usually fast, it is common to run it multiple times with different starting conditions.


1 Answers

The initialization is supposed to be random, so does the clustering converge to the same result regardless of the initialization?

Quite the contrary. If the k-means problem were a nice, convex optimization problem, we wouldn't be randomly initializing it, since simply starting at (0,0,...,0) would give the right answer.

The reason for random initialization is exactly that you can get different solutions by trying different random seeds, then pick the best one when all your k-means runs are done. Ten runs is a good rule of thumb for many applications.

Finding the global minimum of the k-means problem is NP-hard in general. The common algorithm is really a heuristic.

like image 68
Fred Foo Avatar answered Sep 23 '22 04:09

Fred Foo