Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to print result of clustering in sklearn

I have a sparse matrix

from scipy.sparse import *
M = csr_matrix((data_np, (rows_np, columns_np)));

then I'm doing clustering that way

from sklearn.cluster import KMeans
km = KMeans(n_clusters=n, init='random', max_iter=100, n_init=1, verbose=1)
km.fit(M)

and my question is extremely noob: how to print the clustering result without any extra information. I don't care about plotting or distances. I just need clustered rows looking that way

Cluster 1
row 1
row 2
row 3

Cluster 2
row 4
row 20
row 1000
...

How can I get it? Excuse me for this question.

like image 788
thepolina Avatar asked Apr 22 '15 13:04

thepolina


People also ask

How do you evaluate clustering results?

Clustering Performance Evaluation Metrics Here clusters are evaluated based on some similarity or dissimilarity measure such as the distance between cluster points. If the clustering algorithm separates dissimilar observations apart and similar observations together, then it has performed well.


1 Answers

Time to help myself. After

km.fit(M)

we run

labels = km.predict(M)

which returns labels, numpy.ndarray. Number of elements in this array equals number of rows. And each element means that a row belongs to the cluster. For example: if first element is 5 it means that row 1 belongs to cluster 5. Lets put our rows in a dictionary of lists looking this way {cluster_number:[row1, row2, row3], ...}

# in row_dict we store actual meanings of rows, in my case it's russian words
clusters = {}
    n = 0
    for item in labels:
        if item in clusters:
            clusters[item].append(row_dict[n])
        else:
            clusters[item] = [row_dict[n]]
        n +=1

and print the result

for item in clusters:
    print "Cluster ", item
    for i in clusters[item]:
        print i
like image 115
thepolina Avatar answered Oct 06 '22 20:10

thepolina