The k-means++ algorithm helps in two following points of the original k-means algorithm:
But are there any drawbacks of k-means++? Should we always used it instead of k-means from now on?
Nobody claims k-means++ runs in O(lg k) time; it's solution quality is O(lg k)-competitive with the optimal solution. Both k-means++ and the common method, called Lloyd's algorithm, are approximations to an NP-hard optimization problem.
I'm not sure what the worst case running time of k-means++ is; note that in Arthur & Vassilvitskii's original description, steps 2-4 of the algorithm refer to Lloyd's algorithm. They do claim that it works both better and faster in practice because it starts from a better position.
The drawbacks of k-means++ are thus:
That said, if your k-means library supports k-means++, then by all means try it out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With