I was required to write a bisecting k-means algorithm, but I didnt understand the algorithm. I know k-means algorithm. Can you explain the algorithm, but not in academic language Thanks.

The idea is iteratively splitting your cloud of points in 2 parts. In other words, you build a random binary tree where each splitting (a node with two children) corresponds to splitting the points of your cloud in 2. You begin with a cloud of points. <ul> <li>Compute its centroid (barycenter) w</li> <li>Select a point at random cL among the points of the cloud</li> <li>Construct the point cR as the symmetric point of cL when compared to w (the segment cL->w is the same as w->cR)</li> <li>Separate the points of your cloud in two, the ones closest to cR belong to a subcloud R, and the ones closest to cL belongs to the subcloud L</li> <li>Reiterate for the subclouds R and L</li> </ul> Notes : You can discard the random points once you've used them. However, keep the centroids of all the subcoulds. Stop when your subclouds contain exactly one point. If you want k clusters, just take k centroids such that they contain all the points of the initial cloud. You can do much more elaborate stuff if you want (minimizing variance of the clouds, etc...) Suppose you want 4 clusters (a power of two for convenience) Then you only need to cut you cloud in two, and then cut each subclouds in two. If you want 8 clusters, then cut again these subclouds once in two. And again for 16 clusters. If you want K clusters with K not a power of 2 (let's say 24) then look at the closest inferior power of two. It's 16. You still lack 8 clusters. Each "level-16-cluster" is the centroid of a "level-16-subcloud". What you'll do is take 8 "level-16-clusters" (at random for example) and replace them each with the two "child" "level-32-clusters". (These two child "level-32-clusters" correspond to two "level-32-subclouds" that add up to the parent "level-16-subcloud")

Bisecting k-means clustering algorithm explanation

1 Answers

The idea is iteratively splitting your cloud of points in 2 parts. In other words, you build a random binary tree where each splitting (a node with two children) corresponds to splitting the points of your cloud in 2.

You begin with a cloud of points.

Compute its centroid (barycenter) w
Select a point at random cL among the points of the cloud
Construct the point cR as the symmetric point of cL when compared to w (the segment cL->w is the same as w->cR)
Separate the points of your cloud in two, the ones closest to cR belong to a subcloud R, and the ones closest to cL belongs to the subcloud L
Reiterate for the subclouds R and L

Notes :

You can discard the random points once you've used them. However, keep the centroids of all the subcoulds.

Stop when your subclouds contain exactly one point.

If you want k clusters, just take k centroids such that they contain all the points of the initial cloud. You can do much more elaborate stuff if you want (minimizing variance of the clouds, etc...) Suppose you want 4 clusters (a power of two for convenience) Then you only need to cut you cloud in two, and then cut each subclouds in two. If you want 8 clusters, then cut again these subclouds once in two. And again for 16 clusters.

If you want K clusters with K not a power of 2 (let's say 24) then look at the closest inferior power of two. It's 16. You still lack 8 clusters. Each "level-16-cluster" is the centroid of a "level-16-subcloud". What you'll do is take 8 "level-16-clusters" (at random for example) and replace them each with the two "child" "level-32-clusters". (These two child "level-32-clusters" correspond to two "level-32-subclouds" that add up to the parent "level-16-subcloud")

answered Oct 18 '22 23:10

B. Decoster

Related questions
                            
                                Simple "maximum value in array" and complexity calculations
                            
                                Real world applications of hypergraphs
                            
                                Best way to find a number in an array that's "out of order"?
                            
                                Which algorithm could solve my wedding table issue?
                            
                                Algorithm: Find minimum sum of k numbers from n arrays(queues)
                            
                                largest sum of contiguous subarray No Larger than k
                            
                                Find the year with the highest population (most efficient solution)
                            
                                Bron-Kerbosch algorithm for clique finding
                            
                                finding long repeated substrings in a massive string
                            
                                Prim's Algorithm Time Complexity
                            
                                Efficient Packing Algorithm for Irregular Polygons
                            
                                How to rearrange data in array so that two similar items are not next to each other?
                            
                                What is the SHA-256 hash of a single "1" bit?
                            
                                Algorithm for computing partial orderings of dependency graphs
                            
                                Autocomplete using a trie
                            
                                Why do people say that Java can't have an expression evaluator?
                            
                                Algorithm Question on File Search Indexing
                            
                                Locating the end points of a bridge-like structure in an image
                            
                                What are the differences between B-tree and B*-tree, except the requirement for fullness?
                            
                                Algorithm to loop over an array from the middle outwards?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Bisecting k-means clustering algorithm explanation

Tags:

algorithm

cluster-analysis

k-means

Nir

People also ask

1 Answers

B. Decoster

Recent Activity

Donate For Us