I am trying to plot the results of a hierarchical clustering in R
as a dendrogram, with rectangles identifying clusters.
The following code does the trick for a vertical dendrogram, but for a horizontal dendrogram, (horiz=TRUE
), the rectangles are not drawn. Is there any way to do the same for horizontal dendrograms too.
library("cluster")
dst <- daisy(iris, metric = c("gower"), stand = FALSE)
hca <- hclust(dst, method = "average")
plot(as.dendrogram(hca), horiz = FALSE)
rect.hclust(hca, k = 3, border = "red")
Moreover I would like to plot a line to cut the tree at a desired distance value. How to plot that in R. The cutree
function returns the clusters, but is it possible to plot it as well.
cutree(hca, k = 3)
The desired output that I am looking for is like this.
How to get this done in R?
Both jlhoward and Backlin answers are good.
What you could also try is using the dendextend
package, designed exactly for this sort of thing. It has a rect.dendrogram
function which works like rect.hclust
, but with a horiz parameter (plus some more control over the location of the edge of the rect). For finding the relevant height you can use the heights_per_k.dendrogram
function (which is much faster when also using the dendextendRcpp
package)
Here is a simple example for how you would get the same result as in the above examples (with an added bonus of colored branches, just for fun):
install.packages("dendextend")
install.packages("dendextendRcpp")
library("dendextend")
library("dendextendRcpp")
# using piping to get the dend
dend <- iris[,-5] %>% dist %>% hclust %>% as.dendrogram
# plot + color the dend's branches before, based on 3 clusters:
dend %>% color_branches(k=3) %>% plot(horiz=TRUE, main = "The dendextend package \n Gives extended functionality to R's dendrogram object")
# add horiz rect
dend %>% rect.dendrogram(k=3,horiz=TRUE)
# add horiz (well, vertical) line:
abline(v = heights_per_k.dendrogram(dend)["3"] + .6, lwd = 2, lty = 2, col = "blue")
Here's a solution using ggplot
and the ggdendro
package. As an added bonus, we can color the labels by cluster...
library(cluster)
dst <- daisy(iris, metric = c("gower"), stand = FALSE)
hca <- hclust(dst, method = "average")
k <- 3
clust <- cutree(hca,k=k) # k clusters
library(ggplot2)
library(ggdendro) # for dendro_data(...)
dendr <- dendro_data(hca, type="rectangle") # convert for ggplot
clust.df <- data.frame(label=rownames(iris), cluster=factor(clust))
dendr[["labels"]] <- merge(dendr[["labels"]],clust.df, by="label")
rect <- aggregate(x~cluster,label(dendr),range)
rect <- data.frame(rect$cluster,rect$x)
ymax <- mean(hca$height[length(hca$height)-((k-2):(k-1))])
ggplot() +
geom_segment(data=segment(dendr), aes(x=x, y=y, xend=xend, yend=yend)) +
geom_text(data=label(dendr), aes(x, y, label=label, hjust=0, color=cluster),
size=3) +
geom_rect(data=rect, aes(xmin=X1-.3, xmax=X2+.3, ymin=0, ymax=ymax),
color="red", fill=NA)+
geom_hline(yintercept=0.33, color="blue")+
coord_flip() + scale_y_reverse(expand=c(0.2, 0)) +
theme_dendro()
To just get the job done (although in a quite ugly way) you could just manually swap the coordinates in the call to rect
in rect.hclust
:
rhc <- function (tree, k = NULL, which = NULL, x = NULL, h = NULL, border = 2,
cluster = NULL)
{
if (length(h) > 1L | length(k) > 1L)
stop("'k' and 'h' must be a scalar")
if (!is.null(h)) {
if (!is.null(k))
stop("specify exactly one of 'k' and 'h'")
k <- min(which(rev(tree$height) < h))
k <- max(k, 2)
}
else if (is.null(k))
stop("specify exactly one of 'k' and 'h'")
if (k < 2 | k > length(tree$height))
stop(gettextf("k must be between 2 and %d", length(tree$height)),
domain = NA)
if (is.null(cluster))
cluster <- cutree(tree, k = k)
clustab <- table(cluster)[unique(cluster[tree$order])]
m <- c(0, cumsum(clustab))
if (!is.null(x)) {
if (!is.null(which))
stop("specify exactly one of 'which' and 'x'")
which <- x
for (n in seq_along(x)) which[n] <- max(which(m < x[n]))
}
else if (is.null(which))
which <- 1L:k
if (any(which > k))
stop(gettextf("all elements of 'which' must be between 1 and %d",
k), domain = NA)
border <- rep_len(border, length(which))
retval <- list()
for (n in seq_along(which)) {
rect(
ybottom = m[which[n]] + 0.66,
xright = par("usr")[3L],
ytop = m[which[n] + 1] + 0.33,
xleft = mean(rev(tree$height)[(k - 1):k]),
border = border[n])
retval[[n]] <- which(cluster == as.integer(names(clustab)[which[n]]))
}
invisible(retval)
}
and call rhc
like you called rect.hclust
:
rhc(hca, k = 3, border = "red")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With