My lecture notes on computer vision mention that the performance of the k-means clustering algorithm can be improved if we know the standard deviation of the clusters. How so? My thinking is that we can use the standard deviations to come up with a better initial estimate through histogram based segmentation first. What do you think? Thanks for any help!

Your lecturer might have the 2002 paper by Veenman et al in mind. The basic idea is that you set the maximum variance you allow in each cluster. You start with as many clusters as data points and then you "evolve" clusters by <ul> <li>merging neighboring clusters if the resulting cluster's variance is below the threshold</li> <li>isolating elements that are "far" if a cluster's variance is above the threshold</li> <li>or moving some elements between neighboring clusters if it decreases the sum of squared errors</li> </ul> (this evolution acts as a global optimization procedure, and prevents the bad consequences of initial assignment of cluster means you have in k-means) To sum up, if you know the variance, you know how varied the clusters should be, so it's easier to e.g. detect outliers (which usually should be put into separate clusters).

Improving k-means clustering

1 Answers

Your lecturer might have the 2002 paper by Veenman et al in mind. The basic idea is that you set the maximum variance you allow in each cluster. You start with as many clusters as data points and then you "evolve" clusters by

merging neighboring clusters if the resulting cluster's variance is below the threshold
isolating elements that are "far" if a cluster's variance is above the threshold
or moving some elements between neighboring clusters if it decreases the sum of squared errors

(this evolution acts as a global optimization procedure, and prevents the bad consequences of initial assignment of cluster means you have in k-means)

To sum up, if you know the variance, you know how varied the clusters should be, so it's easier to e.g. detect outliers (which usually should be put into separate clusters).

answered Sep 26 '22 03:09

ang mo

Related questions
                            
                                How to scale a two dimensional array in javascript fast?
                            
                                No. of distinct subsequences of length 3 in an array of length n
                            
                                Representing sparse integer sets?
                            
                                News clustering
                            
                                Tips for Project Euler Problem #78
                            
                                Select a random item, without knowing the total number of items
                            
                                Solving a recreational square packing problem
                            
                                Find all collinear points in a given set
                            
                                Can't get insertion sort from introduction to algorithms 3rd ed. right. Where is my thinking mistake?
                            
                                Where to store web crawler data?
                            
                                C++ `Timer` class implementation
                            
                                Sorting in linear time and in place
                            
                                Peak and Flag Codility latest chellange
                            
                                In an unsorted array, replace every element by the first larger element to the right
                            
                                Represent natural number as sum of distinct squares
                            
                                DFA construction in Knuth-Morris-Pratt algorithm
                            
                                Improve accuracy of image processing to count fungus spores
                            
                                What's the most efficient way to extract min, max & median from a vector
                            
                                Segment tree java implementation [closed]
                            
                                Maximum Weight / Minimum Cost Bipartite Matching Code in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Improving k-means clustering

Tags:

algorithm

machine-learning

k-means

computer-vision

Dhruv Gairola

People also ask

1 Answers

ang mo

Recent Activity

Donate For Us