Does the k-means clustering algorithm always yield the same solution? The initialization is supposed to be random, so does the clustering converge to the same result regardless of the initialization?
No, it doesn't.
k-means has trouble clustering data where clusters are of varying sizes and density. To cluster such data, you need to generalize k-means as described in the Advantages section. Clustering outliers. Centroids can be dragged by outliers, or outliers might get their own cluster instead of being ignored.
There is a popular method known as elbow method which is used to determine the optimal value of K to perform the K-Means Clustering Algorithm. The basic idea behind this method is that it plots the various values of cost with changing k. As the value of K increases, there will be fewer elements in the cluster.
The algorithm does not guarantee convergence to the global optimum. The result may depend on the initial clusters. As the algorithm is usually fast, it is common to run it multiple times with different starting conditions.
The initialization is supposed to be random, so does the clustering converge to the same result regardless of the initialization?
Quite the contrary. If the k-means problem were a nice, convex optimization problem, we wouldn't be randomly initializing it, since simply starting at (0,0,...,0) would give the right answer.
The reason for random initialization is exactly that you can get different solutions by trying different random seeds, then pick the best one when all your k-means runs are done. Ten runs is a good rule of thumb for many applications.
Finding the global minimum of the k-means problem is NP-hard in general. The common algorithm is really a heuristic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With