I'd like to understand the parameter max_iter from the class sklearn.cluster.KMeans.
According to the documentation:
max_iter : int, default: 300
Maximum number of iterations of the k-means algorithm for a single run.
But in my opinion if I have 100 Objects the code must run 100 times, if I have 10.000 Objects the code must run 10.000 times to classify every object. And on the other hand it makes no sense to run several times over all objects.
What is my misconception and how do I have to interpret this parameter?
max_iterint, default=300. Maximum number of iterations of the k-means algorithm for a single run. tolfloat, default=1e-4. Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.
Take a look here:
https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
Each time you click update centroids, a new iteration is performed. It makes sense, because when centroids are moved, distances to those centroids also change and some points may change cluster.
Yes, you are misinterpreting the parameter.
One iteration is one pass over the entire data set. If you have 100 objects, one iteration assigns 100 points. if you have 10000 objects, one iteration processes 10000 objects.
There are more clever algorithms; but sklearn k-means processes every object in every iteration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With