Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MATLAB kMeans does not always converge to global minima

I wrote a k-Means clustering algorithm in MATLAB, and I thought I'd try it against MATLABs built in kmeans(X,k).

However, for the very easy four cluster setup (see picture), MATLAB kMeans does not always converge to the optimum solution (left) but to (right).

The one I wrote does not always do that either, but should not the built-in function be able to solve such an easy problem, always finding the optimal solution?

alt text

like image 621
Theodor Avatar asked Sep 07 '10 10:09

Theodor


1 Answers

As @Alexandre C. explained, the K-means algorithm depends on the initial cluster centroid positions, and there is no guarantee that it will converge to the optimal solution.

The best you can do is to repeat the experiment several times with random starting points.

MATLAB's implementation offers such an option: replicates which repeats the clustering N times and pick the one with the lowest total within-cluster point-to-centroid distances. You get also to control how the initial centroids are chosen with the start option.

In addition, MATLAB provides the choice among a number of distance measures (Euclidean, Manhattan, Cosine, ...). One neat option emptyaction allows you to control what happens when a cluster loses all its assigned member during the iterations.

But the real advantage is that it employs a two-phase algorithm: the usual assign-recompute iterations, followed by an online update phase. Be sure to read the algorithm section of the documentation page for more information.

like image 98
Amro Avatar answered Sep 30 '22 13:09

Amro