Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the time complexity of k-means?

Tags:

I was going through the k-means Wikipedia page. Based on the algorithm, I think the complexity is O(n*k*i) (n = total elements, k = number of cluster iteration)

So can someone explain me this statement from Wikipedia and how is this NP hard?

If k and d (the dimension) are fixed, the problem can be exactly solved in time O(ndk+1 log n), where n is the number of entities to be clustered.

like image 771
parallel Avatar asked Sep 05 '13 10:09

parallel


People also ask

What is the time and space complexity of k-means?

The space complexity of K-means clustering algorithm is O(N(D + K)). Based on the number of distance calculations, the time complexity of K-means is O(NKI).

What is the space complexity of k-means algorithm for D dimensional feature vector N is the number of points and k is the number of cluster centroid?

Its space complexity is O(k+n).

What is K in K-means clustering?

To achieve this objective, K-means looks for a fixed number (k) of clusters in a dataset.” A cluster refers to a collection of data points aggregated together because of certain similarities. You'll define a target number k, which refers to the number of centroids you need in the dataset.

Why does k-means always converge?

The cycle can not have length greater than 1 because otherwise by (2) you would have some clustering which has a lower cost than itself which is impossible. Hence the cycle must have length exactly 1. Hence k-means converges in a finite number of iterations.


2 Answers

It depends on what you call k-means.

The problem of finding the global optimum of the k-means objective function

enter image description here

is NP-hard, where Si is the cluster i (and there are k clusters), xj is the d-dimensional point in cluster Si and μi is the centroid (average of the points) of cluster Si.

However, running a fixed number t of iterations of the standard algorithm takes only O(t*k*n*d), for n (d-dimensional) points, where kis the number of centroids (or clusters). This what practical implementations do (often with random restarts between the iterations).

The standard algorithm only approximates a local optimum of the above function, and so do all the k-means algorithms that I've seen.

like image 185
Fred Foo Avatar answered Sep 26 '22 07:09

Fred Foo


In this answer, note that i used in the k-means objective formula and i used in the analysis of the time complexity of k-means (that is, the number of iterations needed until convergence) are different.

like image 36
masec Avatar answered Sep 25 '22 07:09

masec