I have a sparse matrix
from scipy.sparse import *
M = csr_matrix((data_np, (rows_np, columns_np)));
then I'm doing clustering that way
from sklearn.cluster import KMeans
km = KMeans(n_clusters=n, init='random', max_iter=100, n_init=1, verbose=1)
km.fit(M)
and my question is extremely noob: how to print the clustering result without any extra information. I don't care about plotting or distances. I just need clustered rows looking that way
Cluster 1
row 1
row 2
row 3
Cluster 2
row 4
row 20
row 1000
...
How can I get it? Excuse me for this question.
Clustering Performance Evaluation Metrics Here clusters are evaluated based on some similarity or dissimilarity measure such as the distance between cluster points. If the clustering algorithm separates dissimilar observations apart and similar observations together, then it has performed well.
Time to help myself. After
km.fit(M)
we run
labels = km.predict(M)
which returns labels, numpy.ndarray. Number of elements in this array equals number of rows. And each element means that a row belongs to the cluster. For example: if first element is 5 it means that row 1 belongs to cluster 5. Lets put our rows in a dictionary of lists looking this way {cluster_number:[row1, row2, row3], ...}
# in row_dict we store actual meanings of rows, in my case it's russian words
clusters = {}
n = 0
for item in labels:
if item in clusters:
clusters[item].append(row_dict[n])
else:
clusters[item] = [row_dict[n]]
n +=1
and print the result
for item in clusters:
print "Cluster ", item
for i in clusters[item]:
print i
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With