Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Visualize lowest nodes in hierarchical clustering with dendrogram

I'm using linkage to generate an agglomerative hierarchical clustering for a dataset of around 5000 instances. I want to visualize the 'bottom' merges in the hierarchy, that is, the nodes close to the leaves with the smallest distance measures.

Unfortunately, the dendrogram visualization prefers to show the 'top' nodes from the last merges in the algorithm. By default it shows the top 30 nodes, collapsing the bottom of the tree. I can change the P value to show more nodes, but I would have to show all 5000+ to see the lowest levels of the clustering at which point the plot is no longer readable.

MCVE

For example, starting from the linkage documentation example

openExample('stats/CompareClusterAssignmentsToClustersExample')
run CompareClusterAssignmentsToClustersExample
dendrogram(Z, 'Orient', 'Left', 'Labels', species);

Produces a dendrogram with the top 30 nodes visible. The nodes with numerical labels are collapsing lower levels of the tree.

Dendrogram with collapsed lower levels

I can increase the number of visible nodes to include all leaves at expense of readability.

dendrogram(Z, size(Z,1), 'Orient', 'Left', 'Labels', species);

Dendrogram with all leaves

What I'd Like

What I'd really like is a zoomed in version of above, like the example below, but showing the first 30 closest clusters.

Zoom of dendrogram with all leaves

What I've Tried

I tried providing the function with the first 30 rows of Z,

dendrogram(Z(1:30), 'Orient', 'Left');

but that throws an "Index exceeds matrix dimensions." error when one of the rows references a cluster in a row > 30.

I also tried using the dendrogram Reorder property, but I am having difficulty finding a valid ordering that orders the clusters from closest to farthest.

%The Z matrix is in order from closest cluster to furthest, 
% so I can use it to create an ordering
Y = reshape(Z(:, 1:2)', 1, [])
Y = Y(Y<151);
dendrogram(Z, 30, 'Orient', 'Left', 'Labels', species, 'Reorder', Y);

I get the error

In the requested ordering of the nodes, some data points belonging to the same leaf in the plot are separated by the points belonging to other leaves. Try to use a different ordering.

It may be the case that such an ordering is not possible if the entire tree is calculated because there would be branch crossings, but I'm hoping that there is a better ordering if I am only looking at a portion of the tree, and clusters at higher levels are not considered.

Question

How can I improve my visualization to show the lowest level clusters in the dendrogram?

like image 633
Cecilia Avatar asked Jul 26 '17 19:07

Cecilia


1 Answers

Emmm...like ylim()?

dendrogram(Z, size(Z,1), 'Orient', 'Left', 'Labels', species);
ylim(max(ylim())-[30,0]);

yields

zoom-in

like image 154
X Zhang Avatar answered Oct 03 '22 02:10

X Zhang