Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot dendrograms with large datasets?

I am using ape (Analysis of Phylogenetics and Evolution) package in R that has dendrogram drawing functionality. I use following commands to read the data in Newick format, and draw a dendrogram using the plot function:

library("ape") gcPhylo <-read.tree(file = "gc.tree") plot(gcPhylo, show.node.label = TRUE) 

As the data set is quite large, it is impossible to see any details in the lower levels of the tree. I can see just black areas but no details. I can only see few levels from the top, and then no detail.

I was wondering if there is any zoom capability of the plot function. I tried to limit the area using xLim and yLim, however, they just limit the area, and do not zoom to make the details visible. Either zooming, or making the details visible without zooming will solve my problem.

I am also appreciated to know any other package, function, or tool that will help me overcoming the problem.

Thanks.

like image 807
Burcu Avatar asked Sep 13 '11 15:09

Burcu


People also ask

How do you visualize a dendrogram?

To visualize the dendrogram, we'll use the following R functions and packages: fviz_dend()[in factoextra R package] to create easily a ggplot2-based beautiful dendrogram. dendextend package to manipulate dendrograms.

How do you make a dendrogram plot?

To draw a dendrogram, you first need to have a numeric matrix. Each line represents an entity (here a car). Each column is a variable that describes the cars. The objective is to cluster the entities to show who shares similarities with whom.

How do I plot a dendrogram in R?

As you already know, the standard R function plot. hclust() can be used to draw a dendrogram from the results of hierarchical clustering analyses (computed using hclust() function). A simplified format is: plot(x, labels = NULL, hang = 0.1, main = "Cluster dendrogram", sub = NULL, xlab = NULL, ylab = "Height", ...)


2 Answers

It is possible to cut a dendrogram at a specified height and plot the elements:

First create a clustering using the built-in dataset USArrests. Then convert to a dendrogram:

hc <- hclust(dist(USArrests)) hcd <- as.dendrogram(hc) 

Next, use cut.dendrogram to cut at a specified height, in this case h=75. This produces a list of a dendrogram for the upper bit of the cut, and a list of dendograms, one for each branch below the cut:

par(mfrow=c(3,1))  plot(hcd, main="Main") plot(cut(hcd, h=75)$upper,       main="Upper tree of cut at h=75") plot(cut(hcd, h=75)$lower[[2]],       main="Second branch of lower tree with cut at h=75") 

enter image description here

like image 71
Andrie Avatar answered Sep 21 '22 05:09

Andrie


The cut function described in the other answer is a very good solution; if you would like to maintain the whole tree on one page for some interactive investigation you could also plot to a large page on a PDF.

The resulting PDF is vectorized so you can zoom in closely with your favourite PDF viewer without loss of resolution.

Here's an example of how to direct plot output to PDF:

# Open a PDF for plotting; units are inches by default pdf("/path/to/a/pdf/file.pdf", width=40, height=15)  # Do some plotting plot(gcPhylo)  # Close the PDF file's associated graphics device (necessary to finalize the output) dev.off() 
like image 27
MatthewS Avatar answered Sep 20 '22 05:09

MatthewS