I wrote a k-Means clustering algorithm in MATLAB, and I thought I'd try it against MATLABs built in kmeans(X,k)
.
However, for the very easy four cluster setup (see picture), MATLAB kMeans does not always converge to the optimum solution (left) but to (right).
The one I wrote does not always do that either, but should not the built-in function be able to solve such an easy problem, always finding the optimal solution?
As @Alexandre C. explained, the K-means algorithm depends on the initial cluster centroid positions, and there is no guarantee that it will converge to the optimal solution.
The best you can do is to repeat the experiment several times with random starting points.
MATLAB's implementation offers such an option: replicates
which repeats the clustering N times and pick the one with the lowest total within-cluster point-to-centroid distances.
You get also to control how the initial centroids are chosen with the start
option.
In addition, MATLAB provides the choice among a number of distance measures (Euclidean, Manhattan, Cosine, ...). One neat option emptyaction
allows you to control what happens when a cluster loses all its assigned member during the iterations.
But the real advantage is that it employs a two-phase algorithm: the usual assign-recompute iterations, followed by an online update phase. Be sure to read the algorithm section of the documentation page for more information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With