Should we used k-means++ instead of k-means?

Question

The k-means++ algorithm helps in two following points of the original k-means algorithm:

The original k-means algorithm has the worst case running time of super-polynomial in input size, while k-means++ has claimed to be O(log k).
The approximation found can yield a not so satisfactory result with respect to objective function compared to the optimal clustering.

But are there any drawbacks of k-means++? Should we always used it instead of k-means from now on?

Fred Foo · Accepted Answer

Nobody claims k-means++ runs in O(lg k) time; it's solution quality is O(lg k)-competitive with the optimal solution. Both k-means++ and the common method, called Lloyd's algorithm, are approximations to an NP-hard optimization problem.

I'm not sure what the worst case running time of k-means++ is; note that in Arthur & Vassilvitskii's original description, steps 2-4 of the algorithm refer to Lloyd's algorithm. They do claim that it works both better and faster in practice because it starts from a better position.

The drawbacks of k-means++ are thus:

It too can find a suboptimal solution (it's still an approximation).
It's not consistently faster than Lloyd's algorithm (see Arthur & Vassilvitskii's tables).
It's more complicated than Lloyd's algo.
It's relatively new, while Lloyd's has proven it's worth for over 50 years.
Better algorithms may exist for specific metric spaces.

That said, if your k-means library supports k-means++, then by all means try it out.

Should we used k-means++ instead of k-means?

Tags:

performance

algorithm

comparison

cluster-analysis

k-means

Karl

1 Answers

Fred Foo

Recent Activity

Donate For Us

Should we used k-means++ instead of k-means?

Tags:

performance

algorithm

comparison

cluster-analysis

k-means

Karl

1 Answers

Fred Foo

Related questions

Recent Activity

Donate For Us