I am using the sklearn.cluster KMeans package. Once I finish the clustering if I need to know which values were grouped together how can I do it?
Say I had 100 data points and KMeans gave me 5 cluster. Now I want to know which data points are in cluster 5. How can I do that.
Is there a function to give the cluster id and it will list out all the data points in that cluster?
The "cluster center" is the arithmetic mean of all the points belonging to the cluster. Each point is closer to its own cluster center than to other cluster centers.
Python offers many useful tools for performing cluster analysis. The best tool to use depends on the problem at hand and the type of data available. Python features three widely used techniques: K-means clustering, Gaussian mixture models and spectral clustering.
I had a similar requirement and i am using pandas to create a new dataframe with the index of the dataset and the labels as columns.
data = pd.read_csv('filename') km = KMeans(n_clusters=5).fit(data) cluster_map = pd.DataFrame() cluster_map['data_index'] = data.index.values cluster_map['cluster'] = km.labels_
Once the DataFrame is available is quite easy to filter, For example, to filter all data points in cluster 3
cluster_map[cluster_map.cluster == 3]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With