Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

K-means Clustering in Python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans


x = [916,684,613,612,593,552,487,484,475,474,438,431,421,418,409,391,389,388,
    380,374,371,369,357,356,340,338,328,317,316,315,313,303,283,257,255,254,245,
    234,232,227,227,222,221,221,219,214,201,200,194,169,155,140]

kmeans = KMeans(n_clusters=4)
a = kmeans.fit(np.reshape(x,(len(x),1)))
centroids = kmeans.cluster_centers_

labels = kmeans.labels_

print(centroids)
print(labels)

colors = ["g.","r.","y.","b."]

for i in range(len(x)):
    plt.plot(x[i], colors[labels[i]], markersize = 10)

plt.scatter(centroids[:, 0], marker = "x", s = 150, linewidths = 5, zorder = 10)
plt.show()

The code above displays 4 clusters, but they are definitely not something I want to have.

I also get an error, which makes it even worst. The output I get is in the picture below.

The error I get is: TypeError: scatter() missing 1 required positional argument: 'y' Error is not a big deal because I don't like what I have anyways.

Clusters Output

Following is the image of how I want my output of clusters to look like.

Cluster I want

like image 677
Master Mind Avatar asked Nov 01 '15 03:11

Master Mind


People also ask

How many clusters in K-means python?

# k is range of number of clusters. The optimal number of clusters based on Silhouette Score is 4.

How do you cluster a dataset in Python?

Python offers many useful tools for performing cluster analysis. The best tool to use depends on the problem at hand and the type of data available. Python features three widely used techniques: K-means clustering, Gaussian mixture models and spectral clustering.

What is k-means clustering in machine learning?

K-Means clustering is an unsupervised learning algorithm. There is no labeled data for this clustering, unlike in supervised learning. K-Means performs the division of objects into clusters that share similarities and are dissimilar to the objects belonging to another cluster. The term 'K' is a number.

What is k-means clustering give an example?

K-means clustering algorithm computes the centroids and iterates until we it finds optimal centroid. It assumes that the number of clusters are already known. It is also called flat clustering algorithm. The number of clusters identified from data by algorithm is represented by 'K' in K-means.


1 Answers

your data is one-dimension (a line), if you want to visualize in two-dimension like pic in your post, your should use two-dimension or multi-dimension data, for example [[1,3], [2,3], [1,5]]. after k-means they are divided into k clusters, and you can use scatter to visualize the output. by the way, scatter take x and y, scatter is two-dimension visualization.

i suggest you to take a look at Orange, a python data mining tool. you can do k-means by drag and drop.

enter image description here

and visualize the output of k-means easily.

enter image description here

good luck! data mining is fun :-)

like image 164
Fujiao Liu Avatar answered Sep 29 '22 13:09

Fujiao Liu