I was required to write a bisecting k-means algorithm, but I didnt understand the algorithm. I know k-means algorithm.
Can you explain the algorithm, but not in academic language
Thanks.
K-means clustering algorithm computes the centroids and iterates until we it finds optimal centroid. It assumes that the number of clusters are already known. It is also called flat clustering algorithm. The number of clusters identified from data by algorithm is represented by 'K' in K-means.
It calculates the sum of the square of the points and calculates the average distance. When the value of k is 1, the within-cluster sum of the square will be high. As the value of k increases, the within-cluster sum of square value will decrease.
Algorithmic steps for k-means clustering 1) Randomly select 'c' cluster centers. 2) Calculate the distance between each data point and cluster centers. 3) Assign the data point to the cluster center whose distance from the cluster center is minimum of all the cluster centers..
The idea is iteratively splitting your cloud of points in 2 parts. In other words, you build a random binary tree where each splitting (a node with two children) corresponds to splitting the points of your cloud in 2.
You begin with a cloud of points.
Compute its centroid (barycenter) w
Select a point at random cL among the points of the cloud
Construct the point cR as the symmetric point of cL when compared to w (the segment cL->w is the same as w->cR)
Separate the points of your cloud in two, the ones closest to cR belong to a subcloud R, and the ones closest to cL belongs to the subcloud L
Reiterate for the subclouds R and L
Notes :
You can discard the random points once you've used them. However, keep the centroids of all the subcoulds.
Stop when your subclouds contain exactly one point.
If you want k clusters, just take k centroids such that they contain all the points of the initial cloud. You can do much more elaborate stuff if you want (minimizing variance of the clouds, etc...) Suppose you want 4 clusters (a power of two for convenience) Then you only need to cut you cloud in two, and then cut each subclouds in two. If you want 8 clusters, then cut again these subclouds once in two. And again for 16 clusters.
If you want K clusters with K not a power of 2 (let's say 24) then look at the closest inferior power of two. It's 16. You still lack 8 clusters. Each "level-16-cluster" is the centroid of a "level-16-subcloud". What you'll do is take 8 "level-16-clusters" (at random for example) and replace them each with the two "child" "level-32-clusters". (These two child "level-32-clusters" correspond to two "level-32-subclouds" that add up to the parent "level-16-subcloud")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With