Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Label and color leaf dendrogram

I am trying to create a dendrogram, were my samples have 5 group codes (act as sample name/species/etc but its repetitive).

Therefore, I have two issues that a help will be great:

  • How can I show the group codes in leaf label (instead of the sample number)?

  • I wish to assign a color to each code group and colored the leaf label according to it (it might happen that they will not be in the same clade and by that I can find more information)?

Is it possible to do so with my script to do so (ape or ggdendro):

sample<-read.table("C:/.../DOutput.txt", header=F, sep="")
groupCodes <- sample[,1]
sample2<-sample[,2:100] 
d <- dist(sample2, method = "euclidean")  
fit <- hclust(d, method="ward")
plot(as.phylo(fit), type="fan") 
ggdendrogram(fit, theme_dendro=FALSE)  

A random dataframe to replace my read.table:

sample = data.frame(matrix(floor(abs(rnorm(20000)*100)),ncol=200))
groupCodes <- c(rep("A",25), rep("B",25), rep("C",25), rep("D",25)) # fixed error
sample2 <- data.frame(cbind(groupCodes), sample) 
like image 659
lroca Avatar asked Sep 14 '13 13:09

lroca


People also ask

What is a leaf in a dendrogram?

A dendrogram consists of many U-shaped lines that connect data points in a hierarchical tree. The height of each U represents the distance between the two data points being connected. If there are 30 or fewer data points in the original data set, then each leaf in the dendrogram corresponds to one data point.

How do I plot a dendrogram in R?

As you already know, the standard R function plot. hclust() can be used to draw a dendrogram from the results of hierarchical clustering analyses (computed using hclust() function). A simplified format is: plot(x, labels = NULL, hang = 0.1, main = "Cluster dendrogram", sub = NULL, xlab = NULL, ylab = "Height", ...)


1 Answers

Here is a solution for this question using a new package called "dendextend", built exactly for this sort of thing.

You can see many examples in the presentations and vignettes of the package, in the "usage" section in the following URL: https://github.com/talgalili/dendextend

Here is the solution for this question: (notice the importance of how to re-order the colors to first fit the data, and then to fit the new order of the dendrogram)

####################
## Getting the data:

sample = data.frame(matrix(floor(abs(rnorm(20000)*100)),ncol=200))
groupCodes <- c(rep("Cont",25), rep("Tre1",25), rep("Tre2",25), rep("Tre3",25))
rownames(sample) <- make.unique(groupCodes)

colorCodes <- c(Cont="red", Tre1="green", Tre2="blue", Tre3="yellow")

distSamples <- dist(sample)
hc <- hclust(distSamples)
dend <- as.dendrogram(hc)

####################
## installing dendextend for the first time:

install.packages('dendextend')

####################
## Solving the question:

# loading the package
library(dendextend)
# Assigning the labels of dendrogram object with new colors:
labels_colors(dend) <- colorCodes[groupCodes][order.dendrogram(dend)]
# Plotting the new dendrogram
plot(dend)


####################
## A sub tree - so we can see better what we got:
par(cex = 1)
plot(dend[[1]], horiz = TRUE)

enter image description here

like image 90
Tal Galili Avatar answered Sep 19 '22 21:09

Tal Galili