I have produced a dendrogram after running hierarchical clustering analysis in R using the below code. I am now trying to colour the labels according to another factor variable, which is saved as a vector. The closest that I have come to achieving this is to colour code the branches using the ColourDendrogram
function in the sparcl
package. If possible, I would prefer to colour-code the labels. I have found answers to a similar questions at the following links Color branches of dendrogram using an existing column & Colouring branches in a dendrogram in R, but I have not been able to work out how to convert the example code for my purpose. Below is some example data and code.
> dput(df)
structure(list(labs = c("a1", "a2", "a3", "a4", "a5", "a6", "a7",
"a8", "b1", "b2", "b3", "b4", "b5", "b6", "b7"), var = c(1L,
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L), td = c(13.1,
14.5, 16.7, 12.9, 14.9, 15.6, 13.4, 15.3, 12.8, 14.5, 14.7, 13.1,
14.9, 15.6, 14.6), fd = c(2L, 3L, 3L, 1L, 2L, 3L, 2L, 3L, 2L,
4L, 2L, 1L, 4L, 3L, 3L)), .Names = c("labs", "var", "td", "fd"
), class = "data.frame", row.names = c(NA, -15L))
df.nw = df[,3:4]
labs = df$labs
d = dist(as.matrix(df.nw)) # find distance matrix
hc = hclust(d, method="complete") # apply hierarchical clustering
plot(hc, hang=-0.01, cex=0.6, labels=labs, xlab="") # plot the dendrogram
hcd = as.dendrogram(hc) # convert hclust to dendrogram
plot(hcd, cex=0.6) # plot using dendrogram object
Var = df$var # factor variable for colours
varCol = gsub("1","red",Var) # convert numbers to colours
varCol = gsub("2","blue",varCol)
# colour-code dendrogram branches by a factor
library(sparcl)
ColorDendrogram(hc, y=varCol, branchlength=0.9, labels=labs,
xlab="", ylab="", sub="")
Any advise on how to do this would be greatly appreciated.
The hclust function in R uses the complete linkage method for hierarchical clustering by default. This particular clustering method defines the cluster distance between two clusters to be the maximum distance between their individual components.
How to read a dendrogram. The key to interpreting a dendrogram is to focus on the height at which any two objects are joined together. In the example above, we can see that E and F are most similar, as the height of the link that joins them together is the smallest. The next two most similar objects are A and B.
As you already know, the standard R function plot. hclust() can be used to draw a dendrogram from the results of hierarchical clustering analyses (computed using hclust() function). A simplified format is: plot(x, labels = NULL, hang = 0.1, main = "Cluster dendrogram", sub = NULL, xlab = NULL, ylab = "Height", ...)
Try
# ... your code
colLab <- function(n) {
if(is.leaf(n)) {
a <- attributes(n)
attr(n, "label") <- labs[a$label]
attr(n, "nodePar") <- c(a$nodePar, lab.col = varCol[a$label])
}
n
}
plot(dendrapply(hcd, colLab))
(via)
For coloring your labels, it would be the easiest to use the labels_colors
function from the dendextend package. For example:
# install.packages("dendextend")
library(dendextend)
small_iris <- iris[c(1, 51, 101, 2, 52, 102), ]
dend <- as.dendrogram(hclust(dist(small_iris[,-5])))
# Like:
# dend <- small_iris[,-5] %>% dist %>% hclust %>% as.dendrogram
# By default, the dend has no colors to the labels
labels_colors(dend)
par(mfrow = c(1,2))
plot(dend, main = "Original dend")
# let's add some color:
colors_to_use <- as.numeric(small_iris[,5])
colors_to_use
# But sort them based on their order in dend:
colors_to_use <- colors_to_use[order.dendrogram(dend)]
colors_to_use
# Now we can use them
labels_colors(dend) <- colors_to_use
# Now each state has a color
labels_colors(dend)
plot(dend, main = "A color for every Species")
For more details on the package, you can have a look at its vignette.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With