Weighted k-means in python

Question

After reading this post here about duplicate values in k-means clustering, I realized I cannot simply use unique points for clustering.

https://stats.stackexchange.com/questions/152808/do-i-need-to-remove-duplicate-objects-for-cluster-analysis-of-objects

I have over 10000000 points, though only 8000 unique ones. Therefore, I initially thought that for speeding it up, I’d use unique points only. Seems like this is a bad idea.

To keep computational time down, this post suggests to add weights to each point. How can this be implemented in python?

Utkarsh Agarwal · Accepted Answer

Using K-Means package from Scikit library, clustering is performed for number of clusters as 11 here. The array Y contains data that has been inserted as weights where as X has actual points that need to be clustered.

from sklearn.cluster import KMeans  #For applying KMeans
##--------------------------------------------------------------------------------------------------------##
#Starting k-means clustering


kmeans = KMeans(n_clusters=11, n_init=10, random_state=0, max_iter=1000)

#Running k-means clustering and enter the ‘X’ array as the input coordinates and ‘Y’ 
array as sample weights
wt_kmeansclus = kmeans.fit(X,sample_weight = Y)
predicted_kmeans = kmeans.predict(X, sample_weight = Y)

#Storing results obtained together with respective city-state labels
kmeans_results = 
pd.DataFrame({"label":data_label,"kmeans_cluster":predicted_kmeans+1})


#Printing count of points alloted to each cluster and then the cluster centers
print(kmeans_results.kmeans_cluster.value_counts())

Weighted k-means in python

Tags:

python

k-means

Keshav M

1 Answers

Utkarsh Agarwal

Recent Activity

Donate For Us

Weighted k-means in python

Tags:

python

k-means

Keshav M

1 Answers

Utkarsh Agarwal

Related questions

Recent Activity

Donate For Us