Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will pandas dataframe object work with sklearn kmeans clustering?

dataset is pandas dataframe. This is sklearn.cluster.KMeans

 km = KMeans(n_clusters = n_Clusters)   km.fit(dataset)   prediction = km.predict(dataset) 

This is how I decide which entity belongs to which cluster:

 for i in range(len(prediction)):      cluster_fit_dict[dataset.index[i]] = prediction[i] 

This is how dataset looks:

 A 1 2 3 4 5 6  B 2 3 4 5 6 7  C 1 4 2 7 8 1  ... 

where A,B,C are indices

Is this the correct way of using k-means?

like image 668
Dark Knight Avatar asked Jan 19 '15 02:01

Dark Knight


People also ask

Can Scikit-learn use pandas DataFrame?

Generally, scikit-learn works on any numeric data stored as numpy arrays or scipy sparse matrices. Other types that are convertible to numeric arrays such as pandas DataFrame are also acceptable.

Is it possible to import KMeans using Sklearn library?

The KMeans class from the sklearn. cluster module from the Scikit-learn library is used for k-means clustering. You can see that the class is imported in the following script. The make_blobs() method from the sklearn.

What is Sklearn cluster KMeans?

'k-means++' : selects initial cluster centroids using sampling based on an empirical probability distribution of the points' contribution to the overall inertia. This technique speeds up convergence, and is theoretically proven to be ⁡ -optimal.


1 Answers

Assuming all the values in the dataframe are numeric,

# Convert DataFrame to matrix mat = dataset.values # Using sklearn km = sklearn.cluster.KMeans(n_clusters=5) km.fit(mat) # Get cluster assignment labels labels = km.labels_ # Format results as a DataFrame results = pandas.DataFrame([dataset.index,labels]).T 

Alternatively, you could try KMeans++ for Pandas.

like image 192
user666 Avatar answered Sep 29 '22 12:09

user666