Is there a online version of the k-Means clustering algorithm? By online I mean that every data point is processed in serial, one at a time as they enter the system, hence saving computing time when used in real time. I have wrote one my self with good results, but I would really prefer to have something "standardized" to refer to, since it is to be used in my master thesis. Also, does anyone have advice for other online clustering algorithms? (lmgtfy failed ;))

Yes there is. Google failed to find it because it's more commonly known as "sequential k-means". You can find two pseudo-code implementations of sequential K-means in this section of some Princeton CS class notes by Richard Duda. I've reproduced one of the two implementations below: <pre class="prettyprint"><code>Make initial guesses for the means m1, m2, ..., mk Set the counts n1, n2, ..., nk to zero Until interrupted Acquire the next example, x If mi is closest to x Increment ni Replace mi by mi + (1/ni)*( x - mi) end_if end_until </code></pre> The beautiful thing about it is that you only need to remember the mean of each cluster and the count of the number of data points assigned to the cluster. Once you update those two variables, you can throw away the data point. I'm not sure where you would be able to find a citation for it. I would start looking in Duda's classic text Pattern Classification and Scene Analysis or the newer edition Pattern Classification. If it's not there, you could try Chris Bishop's newest book or Daphne Koller and Nir Friedman's recent text.

Online k-means clustering

1 Answers

Yes there is. Google failed to find it because it's more commonly known as "sequential k-means".

You can find two pseudo-code implementations of sequential K-means in this section of some Princeton CS class notes by Richard Duda. I've reproduced one of the two implementations below:

Make initial guesses for the means m1, m2, ..., mk Set the counts n1, n2, ..., nk to zero Until interrupted     Acquire the next example, x     If mi is closest to x         Increment ni         Replace mi by mi + (1/ni)*( x - mi)     end_if end_until

The beautiful thing about it is that you only need to remember the mean of each cluster and the count of the number of data points assigned to the cluster. Once you update those two variables, you can throw away the data point.

I'm not sure where you would be able to find a citation for it. I would start looking in Duda's classic text Pattern Classification and Scene Analysis or the newer edition Pattern Classification. If it's not there, you could try Chris Bishop's newest book or Daphne Koller and Nir Friedman's recent text.

111

answered Sep 29 '22 16:09

qdjm

Related questions
                            
                                Scikit K-means clustering performance measure
                            
                                How to make TF-IDF matrix dense?
                            
                                Can I use K-means algorithm on a string?
                            
                                How can I cluster a graph in Python?
                            
                                How would you group/cluster these three areas in arrays in python?
                            
                                Clustering tree structured data
                            
                                Algorithm for fitting objects in a space
                            
                                What does the Brown clustering algorithm output mean?
                            
                                Grouping similar news contents together like in GOOGLE NEWS
                            
                                Better text documents clustering than tf/idf and cosine similarity?
                            
                                Clustering cosine similarity matrix
                            
                                How to use 'hclust' as function call in R
                            
                                Changes of clustering results after each time run in Python scikit-learn
                            
                                Best clustering algorithm? (simply explained)
                            
                                How to get flat clustering corresponding to color clusters in the dendrogram created by scipy
                            
                                How to calculate BIC for k-means clustering in R
                            
                                Fast (< n^2) clustering algorithm
                            
                                Clustering text documents using scikit-learn kmeans in Python
                            
                                Clustering (fkmeans) with Mahout using Clojure
                            
                                Extracting clusters from seaborn clustermap

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Online k-means clustering

Tags:

cluster-analysis

k-means

Theodor

People also ask

1 Answers

qdjm

Recent Activity

Donate For Us