Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

add labels to sklearn k-means

I am trying to use kmeans in python.

data = [[1,2,3,4,5],[1,0,3,2,4],[4,3,234,5,5],[23,4,5,1,4],[23,5,2,3,5]]

Each of this data have a label. Example:

[1,2,3,4,5] -> Fiat1
[1,0,3,2,4] -> Fiat2
[4,3,234,5,5] -> Mercedes
[23,4,5,1,4] -> Opel
[23,5,2,3,5] -> bmw

kmeans = KMeans(init='k-means++', n_clusters=3, n_init=10)
kmeans.fit(data)

My objective is after I run the KMeans, I want to obtain the labels of each cluster.

A fake example:

Cluster 1: Fiat1, Fiat2

Cluster 2: Mercedes

Cluster 3: bmw, Opel

How can I do that ?

like image 474
dijiri Avatar asked Jul 17 '16 21:07

dijiri


People also ask

Can we use k-means clustering for labeled data?

6. K-Means Classification. If our data is labeled, we can still use K-Means, even though it's an unsupervised algorithm. We only need to adjust the training process.

What is labels in KMeans?

It receives a label as the index of the cluster it gets assigned to. We can see these labels: y_pred. array([4, 0, 1, ..., 2, 1, 0], dtype=int32)y_pred is kmeans.labels_ True. We can also see the five centroids (cluster centres) that the algorithm found: kmeans.cluster_centers_

What is init =' K means ++'?

init{'k-means++', 'random'}, callable or array-like of shape (n_clusters, n_features), default='k-means++' Method for initialization: 'k-means++' : selects initial cluster centroids using sampling based on an empirical probability distribution of the points' contribution to the overall inertia.

What is Cluster_centers_?

The "cluster center" is the arithmetic mean of all the points belonging to the cluster. Each point is closer to its own cluster center than to other cluster centers.


1 Answers

Code

from sklearn.cluster import KMeans
import numpy as np

data = np.array([[1,2,3,4,5],[1,0,3,2,4],[4,3,234,5,5],[23,4,5,1,4],[23,5,2,3,5]])
labels = np.array(['Fiat1', 'Fiat2', 'Mercedes', 'Opel', 'BMW'])
N_CLUSTERS = 3

kmeans = KMeans(init='k-means++', n_clusters=N_CLUSTERS, n_init=10)
kmeans.fit(data)
pred_classes = kmeans.predict(data)

for cluster in range(N_CLUSTERS):
    print('cluster: ', cluster)
    print(labels[np.where(pred_classes == cluster)])

Output:

cluster:  0
['Opel' 'BMW']
cluster:  1
['Mercedes']
cluster:  2
['Fiat1' 'Fiat2']
like image 184
sascha Avatar answered Sep 27 '22 00:09

sascha