Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you compare the "similarity" between two dendrograms (in R)?

Tags:

I have two dendrograms which I wish to compare to each other in order to find out how "similar" they are. But I don't know of any method to do so (let alone a code to implement it, say, in R).

Any leads ?

UPDATE (2014-09-13):

Since asking this question, I have written an R package called dendextend, for the visualization, manipulation and comparison of dendrogram. This package is on CRAN and comes with a detailed vignette. It includes functions such as cor_cophenetic, cor_bakers_gamma and Bk / Bk_plot. As well as a tanglegram function for visually comparing two trees.

like image 878
Tal Galili Avatar asked Feb 07 '10 21:02

Tal Galili


People also ask

How do you Analyse Dendrograms?

The key to interpreting a dendrogram is to focus on the height at which any two objects are joined together. In the example above, we can see that E and F are most similar, as the height of the link that joins them together is the smallest. The next two most similar objects are A and B.

What is dendrogram in R programming?

A dendrogram display the hierarchical relationship between objects and it is created by using hierarchical clustering. In base R, we can use hclust function to create the clusters and the plot function can be used to create the dendrogram.

What Cophenetic correlation tells us?

In statistics, and especially in biostatistics, cophenetic correlation (more precisely, the cophenetic correlation coefficient) is a measure of how faithfully a dendrogram preserves the pairwise distances between the original unmodeled data points.

What is a dendrogram and what does it represent?

A dendrogram is a diagram representing a tree. This diagrammatic representation is frequently used in different contexts: in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses.


2 Answers

Comparing dendrograms is not quite the same as comparing hierarchical clusterings, because the former includes the lengths of branches as well as the splits, but I also think that's a good start. I would suggest you read E. B. Fowlkes & C. L. Mallows (1983). "A Method for Comparing Two Hierarchical Clusterings". Journal of the American Statistical Association 78 (383): 553–584 (link).

Their approach is based on cutting the trees at each level k, getting a measure Bk that compares the groupings into k clusters, and then examining the Bk vs k plots. The measure Bk is based upon looking at pairs of objects and seeing whether they fall into the same cluster or not.

I am sure that one can write code based on this method, but first we would need to know how the dendrograms are represented in R.

like image 142
Aniko Avatar answered Sep 30 '22 06:09

Aniko


As you know, Dendrograms arise from hierarchical clustering - so what you are really asking is how can I compare the results of two hierarchical clustering runs. There are no standard metrics I know of, but I would be looking at the number of clusters found and comparing membership similarity between like clusters. Here is a good overview of hierarchical clustering that my colleague wrote on clustering scotch whiskey's.

like image 33
Paul Avatar answered Sep 30 '22 04:09

Paul