I try to implement k-means as a homework assignment. My exercise sheet gives me following remark regarding empty centers:
During the iterations, if any of the cluster centers has no data points associated with it, replace it with a random data point.
That confuses me a bit, firstly Wikipedia or other sources I read do not mention that at all. I further read about a problem with 'choosing a good k for your data' - how is my algorithm supposed to converge if I start setting new centers for cluster that were empty.
If I ignore empty clusters I converge after 30-40 iterations. Is it wrong to ignore empty clusters?
The k-means algorithm is one of the most widely used clustering algorithms and has been applied in many fields of science and technology. One of the major problems of the k-means algorithm is that it may produce empty clusters depending on initial center vectors.
The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, is common in many applications.
Method for initialization: ' k-means++ ': selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. ' random ': choose n_clusters observations (rows) at random from data for the initial centroids.
k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster.
Check out this example of how empty clusters can happen: http://www.ceng.metu.edu.tr/~tcan/ceng465_f1314/Schedule/KMeansEmpty.html It basically means either 1) a random tremor in the force, or 2) the number of clusters k is wrong. You should iterate over a few different values for k and pick the best. If during your iterating you should encounter an empty cluster, place a random data point into that cluster and carry on. I hope this helped on your homework assignment last year.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With