Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

k-means empty cluster

Tags:

k-means

I try to implement k-means as a homework assignment. My exercise sheet gives me following remark regarding empty centers:

During the iterations, if any of the cluster centers has no data points associated with it, replace it with a random data point.

That confuses me a bit, firstly Wikipedia or other sources I read do not mention that at all. I further read about a problem with 'choosing a good k for your data' - how is my algorithm supposed to converge if I start setting new centers for cluster that were empty.

If I ignore empty clusters I converge after 30-40 iterations. Is it wrong to ignore empty clusters?

like image 515
toobee Avatar asked Jun 17 '12 22:06

toobee


People also ask

Can k-means have an empty cluster?

The k-means algorithm is one of the most widely used clustering algorithms and has been applied in many fields of science and technology. One of the major problems of the k-means algorithm is that it may produce empty clusters depending on initial center vectors.

Can k-means handle missing data?

The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, is common in many applications.

How do you initialize a cluster for k-means?

Method for initialization: ' k-means++ ': selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. ' random ': choose n_clusters observations (rows) at random from data for the initial centroids.

What is a cluster in k-means?

k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster.


1 Answers

Check out this example of how empty clusters can happen: http://www.ceng.metu.edu.tr/~tcan/ceng465_f1314/Schedule/KMeansEmpty.html It basically means either 1) a random tremor in the force, or 2) the number of clusters k is wrong. You should iterate over a few different values for k and pick the best. If during your iterating you should encounter an empty cluster, place a random data point into that cluster and carry on. I hope this helped on your homework assignment last year.

like image 100
offwhitelotus Avatar answered Oct 16 '22 06:10

offwhitelotus