Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the samples in each cluster?

I am using the sklearn.cluster KMeans package. Once I finish the clustering if I need to know which values were grouped together how can I do it?

Say I had 100 data points and KMeans gave me 5 cluster. Now I want to know which data points are in cluster 5. How can I do that.

Is there a function to give the cluster id and it will list out all the data points in that cluster?

like image 425
user77005 Avatar asked Mar 24 '16 07:03

user77005


People also ask

What is Cluster_centers_?

The "cluster center" is the arithmetic mean of all the points belonging to the cluster. Each point is closer to its own cluster center than to other cluster centers.

How do you cluster a dataset in Python?

Python offers many useful tools for performing cluster analysis. The best tool to use depends on the problem at hand and the type of data available. Python features three widely used techniques: K-means clustering, Gaussian mixture models and spectral clustering.


1 Answers

I had a similar requirement and i am using pandas to create a new dataframe with the index of the dataset and the labels as columns.

data = pd.read_csv('filename')  km = KMeans(n_clusters=5).fit(data)  cluster_map = pd.DataFrame() cluster_map['data_index'] = data.index.values cluster_map['cluster'] = km.labels_ 

Once the DataFrame is available is quite easy to filter, For example, to filter all data points in cluster 3

cluster_map[cluster_map.cluster == 3] 
like image 63
Praveen Avatar answered Sep 28 '22 17:09

Praveen