Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted k-means in python

Tags:

python

k-means

After reading this post here about duplicate values in k-means clustering, I realized I cannot simply use unique points for clustering.

https://stats.stackexchange.com/questions/152808/do-i-need-to-remove-duplicate-objects-for-cluster-analysis-of-objects

I have over 10000000 points, though only 8000 unique ones. Therefore, I initially thought that for speeding it up, I’d use unique points only. Seems like this is a bad idea.

To keep computational time down, this post suggests to add weights to each point. How can this be implemented in python?

like image 773
Keshav M Avatar asked May 23 '26 16:05

Keshav M


1 Answers

Using K-Means package from Scikit library, clustering is performed for number of clusters as 11 here. The array Y contains data that has been inserted as weights where as X has actual points that need to be clustered.

from sklearn.cluster import KMeans  #For applying KMeans
##--------------------------------------------------------------------------------------------------------##
#Starting k-means clustering


kmeans = KMeans(n_clusters=11, n_init=10, random_state=0, max_iter=1000)

#Running k-means clustering and enter the ‘X’ array as the input coordinates and ‘Y’ 
array as sample weights
wt_kmeansclus = kmeans.fit(X,sample_weight = Y)
predicted_kmeans = kmeans.predict(X, sample_weight = Y)

#Storing results obtained together with respective city-state labels
kmeans_results = 
pd.DataFrame({"label":data_label,"kmeans_cluster":predicted_kmeans+1})


#Printing count of points alloted to each cluster and then the cluster centers
print(kmeans_results.kmeans_cluster.value_counts())
like image 113
Utkarsh Agarwal Avatar answered May 25 '26 06:05

Utkarsh Agarwal



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!