Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

With SciPy how do I get clustering for k=? with doing hierarchical clustering

So I am using fastcluster with SciPy to do agglomerative clustering. I can do dendrogram to get the dendrogram for the clustering. I can do fcluster(Z, sqrt(D.max()), 'distance') to get a pretty good clustering for my data. What if I want to manually inspect a region in the dendrogram where say k=3 (clusters) and then I want to inspect k=6 (clusters)? How do I get the clustering at a specific level of the dendrogram?

I see all these functions with tolerances, but I don't understand how to convert from tolerance to number of clusters. I can manually build the clustering using a simple data set by going through the linkage (Z) and piecing the clusters together step by step, but this is not practical for large data sets.

like image 476
demongolem Avatar asked Jul 12 '13 14:07

demongolem


2 Answers

If you want to cut the tree at a specific level, then use:

fl = fcluster(cl,numclust,criterion='maxclust')

where cl is the output of your linkage method and numclust is the number of clusters you want to get.

like image 173
dkar Avatar answered Sep 28 '22 05:09

dkar


Hierarchical clustering allows you to zoom in and out to get fine or coarse grained views of the clustering. So, it might not be clear in advance which level of the dendrogram to cut. A simple solution is to get the cluster membership at every level. It is also possible to select the desired number of clusters.

import numpy as np
from scipy import cluster
np.random.seed(23)
X = np.random.randn(20, 4)
Z = cluster.hierarchy.ward(X)
cutree_all = cluster.hierarchy.cut_tree(Z)
cutree1 = cluster.hierarchy.cut_tree(Z, n_clusters=[5, 10])
print("membership at all levels \n", cutree_all) 
print("membership for 5 and 10 clusters \n", cutree1)
like image 25
Charlie Carroll Avatar answered Sep 28 '22 05:09

Charlie Carroll