Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a legend for a dendrogram with coloured leaves in r

I have coloured the leaves in a dendrogram as follows

require(graphics)

dm <- hclust(dist(USArrests[1:5,]), "ave")

df<-data.frame("State"=c("Alabama","Alaska","Arizona","Arkansas","California"),   "Location"=c("South","North","West","South","West"))


color.sites<-function(dm){
    dend<-as.dendrogram(dm)
    plot(dend)

    cols <- attributes(dend)
    df$ColorGroups <- factor(df$Location)

    #Set colour pallette
    Location.Pal <- rainbow(nlevels(df$ColorGroups), s=0.9,v=0.9,start=0.1,end=0.9,alpha=1)

    colorleaves <- function (n) {
    # only apply to "leaves" in other words the labels
    if(is.leaf(n)) { 
        i <- which(df$State == attr(n,"label"))
        col.lab  <- Location.Pal[[unclass(df$ColorGroups[[i]])]]
        a <- attributes(n)
        attr(n, "nodePar") <- c(a$nodePar, list(lab.col = col.lab))
    }
    n
}

xx <- dendrapply(dend, colorleaves)

plot(xx, cex=3, cex.main=2, cex.lab=5, cex.axis=1, mar=c(3,3,3,3), main="Title")
}

color.sites(dm)

enter image description here

I would like to: 1) add a legend explaining the colours (i.e. Orange = North) 2) make the leaf labels larger and bolder (cex.lab does not seem to do the job) 3) create a color palette that has sharply contrasting colour (rainbow,heat.colors etc all seem to blend together when there are many leaves and colours in the dendrogram.

Any advice is greatly appreciated !

like image 662
Elizabeth Avatar asked Aug 15 '12 10:08

Elizabeth


People also ask

What is a leaf in a dendrogram?

A dendrogram consists of many U-shaped lines that connect data points in a hierarchical tree. The height of each U represents the distance between the two data points being connected. If there are 30 or fewer data points in the original data set, then each leaf in the dendrogram corresponds to one data point.

How do you make a dendrogram in R?

As you already know, the standard R function plot. hclust() can be used to draw a dendrogram from the results of hierarchical clustering analyses (computed using hclust() function). A simplified format is: plot(x, labels = NULL, hang = 0.1, main = "Cluster dendrogram", sub = NULL, xlab = NULL, ylab = "Height", ...)

How do you visualize a dendrogram?

To visualize the dendrogram, we'll use the following R functions and packages: fviz_dend()[in factoextra R package] to create easily a ggplot2-based beautiful dendrogram. dendextend package to manipulate dendrograms.

How do you read a dendrogram in R?

There are two ways to interpret a dendrogram: in terms of large-scale groups or in terms of similarities among individual chunks. To identify large-scale groups, we start reading from the top down, finding the branch points that are at high levels in the structure.


2 Answers

If you already know how to use and tweak ggplot2 graphics, another solution will be to use @Andrie ggdendro package

library(ggplot2)
library(ggdendro)

dm <- hclust(dist(USArrests[1:5,]), "ave")

df <- data.frame(State = c("Alabama","Alaska","Arizona","Arkansas","California"),
                 Location = c("South","North","West","South","West"))


hcdata<- dendro_data(dm, type="rectangle")

hcdata$labels <- merge(x = hcdata$labels, y = df,  by.x = "label", by.y = "State")


ggplot() +
 geom_segment(data=segment(hcdata), aes(x=x, y=y, xend=xend, yend=yend)) +
 geom_text(data = label(hcdata), aes(x=x, y=y, label=label, colour = Location, hjust=0), size=3) +
 geom_point(data = label(hcdata), aes(x=x, y=y), size=3, shape = 21) +
 coord_flip() +
 scale_y_reverse(expand=c(0.2, 0)) +
 scale_colour_brewer(palette = "Dark2") + 
 theme_dendro() 

enter image description here

like image 86
dickoa Avatar answered Oct 18 '22 06:10

dickoa


  1. Use legend()

    cols <- c("orange","forestgreen")
    legend("topright", legend = c("North","South"),
           fill = cols, border = cols, bty = "n")
    
  2. I don't believe you can, without hacking stats:::plot.dendrogram() as the labels are drawn with text() and graphical parameters are not passed on to that function. The relevant code in stats:::plot.dendrogram() is:

    if (!is.null(et <- attr(x, "edgetext"))) {
        my <- mean(hgt, yTop)
        if (horiz) 
            text(my, x0, et)
        else text(x0, my, et)
    }
    

    Copy the entire function source into an editor and edit it to do what you want, then assign it to your own function object and use it. If it fails because it can't find functions (they may be unexported from namespaces, find out which namespace it is an prepend the offending function with ns::: where ns is the relevant namespace.

  3. Try the RColorBrewer package for one option to choose categorical palettes.

like image 20
Gavin Simpson Avatar answered Oct 18 '22 06:10

Gavin Simpson