I was going through the k-means Wikipedia page. Based on the algorithm, I think the complexity is <code>O(n*k*i)</code> (<code>n</code> = total elements, <code>k</code> = number of cluster iteration) So can someone explain me this statement from Wikipedia and how is this NP hard? <blockquote> If <code>k</code> and <code>d</code> (the dimension) are fixed, the problem can be exactly solved in time <code>O(ndk+1 log n)</code>, where <code>n</code> is the number of entities to be clustered. </blockquote>

It depends on what you call k-means. The problem of finding the global optimum of the k-means objective function <img src="https://i.stack.imgur.com/PGt9x.png" alt="enter image description here"> is NP-hard, where <code>Si</code> is the cluster <code>i</code> (and there are <code>k</code> clusters), <code>xj</code> is the <code>d</code>-dimensional point in cluster <code>Si</code> and <code>μi</code> is the centroid (average of the points) of cluster <code>Si</code>. However, running a fixed number <code>t</code> of iterations of the standard algorithm takes only <code>O(t*k*n*d)</code>, for <code>n</code> (<code>d</code>-dimensional) points, where <code>k</code>is the number of centroids (or clusters). This what practical implementations do (often with random restarts between the iterations). The standard algorithm only approximates a local optimum of the above function, and so do all the k-means algorithms that I've seen.

In this answer, note that <code>i</code> used in the k-means objective formula and <code>i</code> used in the analysis of the time complexity of k-means (that is, the number of iterations needed until convergence) are different.

What is the time complexity of k-means?

2 Answers

It depends on what you call k-means.

The problem of finding the global optimum of the k-means objective function

enter image description here

is NP-hard, where S_i is the cluster i (and there are k clusters), x_j is the d-dimensional point in cluster S_i and μ_i is the centroid (average of the points) of cluster S_i.

However, running a fixed number t of iterations of the standard algorithm takes only O(t*k*n*d), for n (d-dimensional) points, where kis the number of centroids (or clusters). This what practical implementations do (often with random restarts between the iterations).

The standard algorithm only approximates a local optimum of the above function, and so do all the k-means algorithms that I've seen.

185

answered Sep 26 '22 07:09

Fred Foo

In this answer, note that i used in the k-means objective formula and i used in the analysis of the time complexity of k-means (that is, the number of iterations needed until convergence) are different.

answered Sep 25 '22 07:09

masec

Related questions
                            
                                Converting Google Maps styles array to Google Static Maps styles string
                            
                                Extract second subelement of every element in a list while ignoring NA's in sapply in R
                            
                                Modal Window Issue (Unknown Provider: ModalInstanceProvider)
                            
                                SQL Server - Case Statement
                            
                                How to call super method when overriding a method through a trait
                            
                                How do I remove Maven from a Eclipse java project?
                            
                                Where can I find the WAMP error log?
                            
                                Using OperatingSystemMXBean to get CPU usage
                            
                                Wordpress: How to get the tag's slug on the tag page
                            
                                Django CMS fails to synch db or migrate
                            
                                Reading data into numpy array from text file
                            
                                Java Thread.stop() vs Thread.interrupt()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the time complexity of k-means?

Tags:

parallel

People also ask

2 Answers

Fred Foo

masec

Recent Activity

Donate For Us