Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot dendrogram using sklearn.AgglomerativeClustering

I'm trying to build a dendrogram using the children_ attribute provided by AgglomerativeClustering, but so far I'm out of luck. I can't use scipy.cluster since agglomerative clustering provided in scipy lacks some options that are important to me (such as the option to specify the amount of clusters). I would be really grateful for a any advice out there.

    import sklearn.cluster     clstr = cluster.AgglomerativeClustering(n_clusters=2)     clusterer.children_ 
like image 539
Shukhrat Khannanov Avatar asked Mar 18 '15 16:03

Shukhrat Khannanov


People also ask

How do you make a dendrogram plot?

Specify Number of Nodes in Dendrogram Plot There are 100 data points in the original data set, X . Create a hierarchical binary cluster tree using linkage . Then, plot the dendrogram for the complete tree (100 leaf nodes) by setting the input argument P equal to 0 . Now, plot the dendrogram with only 25 leaf nodes.

How do you visualize agglomerative clustering?

Use the dendrogram function to show the agglomerative clustering performed on the dataset. The dendrogram shown can be truncated to show the last few merged clusters. Truncate the dendogram to show last 12 merged clusters.


2 Answers

Here is a simple function for taking a hierarchical clustering model from sklearn and plotting it using the scipy dendrogram function. Seems like graphing functions are often not directly supported in sklearn. You can find an interesting discussion of that related to the pull request for this plot_dendrogram code snippet here.

I'd clarify that the use case you describe (defining number of clusters) is available in scipy: after you've performed the hierarchical clustering using scipy's linkage you can cut the hierarchy to whatever number of clusters you want using fcluster with number of clusters specified in the t argument and criterion='maxclust' argument.

like image 72
David Diaz Avatar answered Sep 20 '22 09:09

David Diaz


Use the scipy implementation of agglomerative clustering instead. Here is an example.

from scipy.cluster.hierarchy import dendrogram, linkage  data = [[0., 0.], [0.1, -0.1], [1., 1.], [1.1, 1.1]]  Z = linkage(data)  dendrogram(Z)   

You can find documentation for linkage here and documentation for dendrogram here.

like image 38
sebastianspiegel Avatar answered Sep 20 '22 09:09

sebastianspiegel