Scikit-learn, KMeans: How to use max_iter

Tags:

I'd like to understand the parameter max_iter from the class sklearn.cluster.KMeans.

According to the documentation:

max_iter : int, default: 300
Maximum number of iterations of the k-means algorithm for a single run.

But in my opinion if I have 100 Objects the code must run 100 times, if I have 10.000 Objects the code must run 10.000 times to classify every object. And on the other hand it makes no sense to run several times over all objects.

What is my misconception and how do I have to interpret this parameter?

576

asked Dec 01 '16 10:12

C-Jay

2 Answers

Take a look here:

https://www.naftaliharris.com/blog/visualizing-k-means-clustering/

Each time you click update centroids, a new iteration is performed. It makes sense, because when centroids are moved, distances to those centroids also change and some points may change cluster.

answered Oct 04 '22 06:10

mbednarski

Yes, you are misinterpreting the parameter.

One iteration is one pass over the entire data set. If you have 100 objects, one iteration assigns 100 points. if you have 10000 objects, one iteration processes 10000 objects.

There are more clever algorithms; but sklearn k-means processes every object in every iteration.

answered Oct 04 '22 06:10

Has QUIT--Anony-Mousse

Related questions
                            
                                Making a PyInstaller exe do both command-line and windowed
                            
                                WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available (error: cuda unavailable)
                            
                                Python how to get the calling function (not just its name)?
                            
                                Flower doesn't display all workers for celery
                            
                                pandas: all NaNs when subtracting two dataframes
                            
                                python create html table from dict
                            
                                The Pythonic way to grow a list of lists
                            
                                Benchmark of HowTo: Reading Data
                            
                                Django Form request.POST.get() always returns empty
                            
                                Jinja2: render template inside template
                            
                                Keras: reshape to connect lstm and conv
                            
                                Apply function to 2nd column in pandas dataframe groupby
                            
                                pandas interpolate only when values exist on both sides
                            
                                HTTP request equivalent of `curl --user` parameter?
                            
                                Splitting bracket-separated string to a dictionary
                            
                                Pyqtgraph: where to find signal for key preses?
                            
                                Why is the python thread count 2 at the beginning?
                            
                                Why are mutable values allowed in Python Enums?
                            
                                How to find maximum number in a 2d python list
                            
                                Django name patterns is not defined in ulrs.py

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scikit-learn, KMeans: How to use max_iter

Tags:

python

parameters

cluster-analysis

k-means

scikit-learn

C-Jay

People also ask

2 Answers

mbednarski

Has QUIT--Anony-Mousse

Recent Activity

Donate For Us