Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to colour the labels of a dendrogram by an additional factor variable in R

I have produced a dendrogram after running hierarchical clustering analysis in R using the below code. I am now trying to colour the labels according to another factor variable, which is saved as a vector. The closest that I have come to achieving this is to colour code the branches using the ColourDendrogram function in the sparcl package. If possible, I would prefer to colour-code the labels. I have found answers to a similar questions at the following links Color branches of dendrogram using an existing column & Colouring branches in a dendrogram in R, but I have not been able to work out how to convert the example code for my purpose. Below is some example data and code.

> dput(df)
structure(list(labs = c("a1", "a2", "a3", "a4", "a5", "a6", "a7", 
"a8", "b1", "b2", "b3", "b4", "b5", "b6", "b7"), var = c(1L, 
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L), td = c(13.1, 
14.5, 16.7, 12.9, 14.9, 15.6, 13.4, 15.3, 12.8, 14.5, 14.7, 13.1, 
14.9, 15.6, 14.6), fd = c(2L, 3L, 3L, 1L, 2L, 3L, 2L, 3L, 2L, 
4L, 2L, 1L, 4L, 3L, 3L)), .Names = c("labs", "var", "td", "fd"
), class = "data.frame", row.names = c(NA, -15L))

df.nw = df[,3:4]
labs = df$labs

d = dist(as.matrix(df.nw))                          # find distance matrix 
hc = hclust(d, method="complete")                   # apply hierarchical clustering 
plot(hc, hang=-0.01, cex=0.6, labels=labs, xlab="") # plot the dendrogram

hcd = as.dendrogram(hc)                             # convert hclust to dendrogram 
plot(hcd, cex=0.6)                                  # plot using dendrogram object

Var = df$var                                        # factor variable for colours
varCol = gsub("1","red",Var)                        # convert numbers to colours
varCol = gsub("2","blue",varCol)

# colour-code dendrogram branches by a factor 
library(sparcl)
ColorDendrogram(hc, y=varCol, branchlength=0.9, labels=labs,
                xlab="", ylab="", sub="")   

Any advise on how to do this would be greatly appreciated.

like image 655
jjulip Avatar asked Dec 15 '14 13:12

jjulip


People also ask

What does Hclust function do in R?

The hclust function in R uses the complete linkage method for hierarchical clustering by default. This particular clustering method defines the cluster distance between two clusters to be the maximum distance between their individual components.

How do you read a dendrogram in R?

How to read a dendrogram. The key to interpreting a dendrogram is to focus on the height at which any two objects are joined together. In the example above, we can see that E and F are most similar, as the height of the link that joins them together is the smallest. The next two most similar objects are A and B.

How do you make a dendrogram in R?

As you already know, the standard R function plot. hclust() can be used to draw a dendrogram from the results of hierarchical clustering analyses (computed using hclust() function). A simplified format is: plot(x, labels = NULL, hang = 0.1, main = "Cluster dendrogram", sub = NULL, xlab = NULL, ylab = "Height", ...)


2 Answers

Try

# ... your code
colLab <- function(n) {
  if(is.leaf(n)) {
    a <- attributes(n)
    attr(n, "label") <- labs[a$label]
    attr(n, "nodePar") <- c(a$nodePar, lab.col = varCol[a$label]) 
  }
  n
}
plot(dendrapply(hcd, colLab))

(via)

like image 154
lukeA Avatar answered Oct 17 '22 05:10

lukeA


For coloring your labels, it would be the easiest to use the labels_colors function from the dendextend package. For example:

# install.packages("dendextend")
library(dendextend)

small_iris <- iris[c(1, 51, 101, 2, 52, 102), ]
dend <- as.dendrogram(hclust(dist(small_iris[,-5])))
# Like: 
# dend <- small_iris[,-5] %>% dist %>% hclust %>% as.dendrogram

# By default, the dend has no colors to the labels
labels_colors(dend)
par(mfrow = c(1,2))
plot(dend, main = "Original dend")

# let's add some color:
colors_to_use <- as.numeric(small_iris[,5])
colors_to_use
# But sort them based on their order in dend:
colors_to_use <- colors_to_use[order.dendrogram(dend)]
colors_to_use
# Now we can use them
labels_colors(dend) <- colors_to_use
# Now each state has a color
labels_colors(dend) 
plot(dend, main = "A color for every Species")

For more details on the package, you can have a look at its vignette.

enter image description here

like image 3
Tal Galili Avatar answered Oct 17 '22 05:10

Tal Galili