Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scikit-learn, KMeans: How to use max_iter

I'd like to understand the parameter max_iter from the class sklearn.cluster.KMeans.

According to the documentation:

max_iter : int, default: 300
Maximum number of iterations of the k-means algorithm for a single run.

But in my opinion if I have 100 Objects the code must run 100 times, if I have 10.000 Objects the code must run 10.000 times to classify every object. And on the other hand it makes no sense to run several times over all objects.

What is my misconception and how do I have to interpret this parameter?

like image 576
C-Jay Avatar asked Dec 01 '16 10:12

C-Jay


People also ask

What is Max_iter in K-means?

max_iterint, default=300. Maximum number of iterations of the k-means algorithm for a single run. tolfloat, default=1e-4. Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.


2 Answers

Take a look here:

https://www.naftaliharris.com/blog/visualizing-k-means-clustering/

Each time you click update centroids, a new iteration is performed. It makes sense, because when centroids are moved, distances to those centroids also change and some points may change cluster.

like image 93
mbednarski Avatar answered Oct 04 '22 06:10

mbednarski


Yes, you are misinterpreting the parameter.

One iteration is one pass over the entire data set. If you have 100 objects, one iteration assigns 100 points. if you have 10000 objects, one iteration processes 10000 objects.

There are more clever algorithms; but sklearn k-means processes every object in every iteration.

like image 23
Has QUIT--Anony-Mousse Avatar answered Oct 04 '22 06:10

Has QUIT--Anony-Mousse