Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add k-means predicted clusters in a column to a dataframe in Python

I have a question about kmeans clustering in python.

So I did the analysis that way:

from sklearn.cluster import KMeans

km = KMeans(n_clusters=12, random_state=1)
new = data._get_numeric_data().dropna(axis=1)
km.fit(new)
predict=km.predict(new)

How can I add the column with cluster results to my first dataframe "data" as an additional column? Thanks!

like image 505
Keithx Avatar asked Jul 14 '16 10:07

Keithx


People also ask

How do you show K-means cluster in Python?

Step-1: Select the value of K, to decide the number of clusters to be formed. Step-2: Select random K points which will act as centroids. Step-3: Assign each data point, based on their distance from the randomly selected points (Centroid), to the nearest/closest centroid which will form the predefined clusters.

How do you select the number of clusters in K-means python?

Visually we can see that the optimal number of clusters should be around 3. But visualizing the data alone cannot always give the right answer. The curve looks like an elbow. In the above plot, the elbow is at k=3 (i.e. Sum of squared distances falls suddenly) indicating the optimal k for this dataset is 3.


1 Answers

Assuming the column length is as the same as each column in you dataframe df, all you need to do is this:

df['NEW_COLUMN'] = pd.Series(predict, index=df.index)
like image 107
Gal Dreiman Avatar answered Oct 20 '22 08:10

Gal Dreiman