Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

kmeans scatter plot: plot different colors per cluster

Tags:

I am trying to do a scatter plot of a kmeans output which clusters sentences of the same topic together. The problem i am facing is plotting points that belongs to each cluster a certain color.

sentence_list=["Hi how are you", "Good morning" ...] #i have 10 setences km = KMeans(n_clusters=5, init='k-means++',n_init=10, verbose=1)  #with 5 cluster, i want 5 different colors km.fit(vectorized) km.labels_ # [0,1,2,3,3,4,4,5,2,5]  pipeline = Pipeline([('tfidf', TfidfVectorizer())]) X = pipeline.fit_transform(sentence_list).todense() pca = PCA(n_components=2).fit(X) data2D = pca.transform(X) plt.scatter(data2D[:,0], data2D[:,1])  km.fit(X) centers2D = pca.transform(km.cluster_centers_) plt.hold(True) labels=np.array([km.labels_]) print labels 

My problem is in the bottom code for plt.scatter(); what should i use for the parameter c?

  1. when i use c=labels in the code, i get this error:

number in rbg sequence outside 0-1 range

2.When i set c= km.labels_ instead, i get the error:

ValueError: Color array must be two-dimensional

plt.scatter(centers2D[:,0], centers2D[:,1],              marker='x', s=200, linewidths=3, c=labels) plt.show() 
like image 404
jxn Avatar asked Jan 30 '15 00:01

jxn


People also ask

Is it possible to choose different color for each dots in the scatter plot?

To plot a scatter graph, use scatter() function. To set the different color for each scatter marker pass color parameter and set its value to given list of colors.

What does clustering look like on a scatter plot?

Cluster: A cluster in a scatter plot is a group of points that follow the same general pattern. They could follow a linear pattern or a curved pattern. Clusters can contain many points.


2 Answers

from sklearn.cluster import KMeans import matplotlib.pyplot as plt  # Scaling the data to normalize model = KMeans(n_clusters=5).fit(X)  # Visualize it: plt.figure(figsize=(8, 6)) plt.scatter(data[:,0], data[:,1], c=model.labels_.astype(float)) 

Now you have different color for different clusters.

like image 86
Zhenye Na Avatar answered Sep 18 '22 15:09

Zhenye Na


The color= or c= property should be a matplotlib color, as mentioned in the documentation for plot.

To map a integer label to a color just do

LABEL_COLOR_MAP = {0 : 'r',                    1 : 'k',                    ....,                    }  label_color = [LABEL_COLOR_MAP[l] for l in labels] plt.scatter(x, y, c=label_color) 

If you don't want to use the builtin one-character color names, you can use other color definitions. See the documentation on matplotlib colors.

like image 26
Hannes Ovrén Avatar answered Sep 22 '22 15:09

Hannes Ovrén