Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partial Row Labels Heatmap - R

I was wondering if anyone knows of a package that allows partial row labeling of heatmaps. I am currently using pheatmap() to construct my heatmaps, but I can use any package that has this functionality.

I have plots with many rows of differentially expressed genes and I would like to label a subset of them. There are two main things to consider (that I can think of):

  • The placement of the text annotation depends on the height of the row. If the rows are too narrow, then the text label will be ambiguous without some sort of pointer.
  • If multiple adjacent rows are significant (i.e. will be labelled), then these will need to be offset, and again, a pointer will be needed.

Below is an example of a partial solution that really only gets maybe halfway there, but I hope illustrates what I'd like to be able to do.

set.seed(1)
require(pheatmap)
require(RColorBrewer)
require(grid)

### Data to plot
data_mat <- matrix(sample(1:10000, 300), nrow = 50, ncol = 6)
rownames(data_mat) <- paste0("Gene", 1:50)
colnames(data_mat) <- c(paste0("A", 1:3), paste0("B", 1:3))

### Set how many genes to annotate
  ### TRUE - make enough labels that some overlap
  ### FALSE - no overlap
tooMany <- T

### Select a few genes to annotate
if (tooMany) {
  sigGenes_v <- paste0("Gene", c(5,20,26,42,47,16,28))
  newMain_v <- "Too Many Labels"
} else {
  sigGenes_v <- paste0("Gene", c(5,20,26,42))
  newMain_v <- "OK Labels"
}

### Make color list
colors_v <- brewer.pal(8, "Dark2")
colors_v <- colors_v[c(1:length(sigGenes_v), 8)]
names(colors_v) <- c(sigGenes_v, "No")
annColors_lsv <- list("Sig" = colors_v)

### Column Metadata
colMeta_df <- data.frame(Treatment = c(rep("A", 3), rep("B", 3)),
                      Replicate = c(rep(1:3, 2)),
                      stringsAsFactors = F, 
                      row.names = colnames(data_mat))

### Row metadata
rowMeta_df <- data.frame(Sig = rep("No", 50), 
                      stringsAsFactors = F,
                      row.names = rownames(data_mat))
for (gene_v in sigGenes_v) rowMeta_df[rownames(rowMeta_df) == gene_v, "Sig"] <- gene_v

### Heatmap
heat <- pheatmap(data_mat,
                 annotation_row = rowMeta_df,
                 annotation_col = colMeta_df,
                 annotation_colors = annColors_lsv,
                 cellwidth = 10,
                 main = "Original Heat")

### Get order of genes after clustering
genesInHeatOrder_v <- heat$tree_row$labels[heat$tree_row$order]
whichSigInHeatOrder_v <- which(genesInHeatOrder_v %in% sigGenes_v)
whichSigInHeatOrderLabels_v <- genesInHeatOrder_v[whichSigInHeatOrder_v]

sigY <- 1 - (0.02 * whichSigInHeatOrder_v)

### Change title
whichMainGrob_v <- which(heat$gtable$layout$name == "main")
heat$gtable$grobs[[whichMainGrob_v]] <- textGrob(label = newMain_v, 
                                                 gp = gpar(fontsize = 16))

### Remove rows
whichRowGrob_v <- which(heat$gtable$layout$name == "row_names")
heat$gtable$grobs[[whichRowGrob_v]] <- textGrob(label = whichSigInHeatOrderLabels_v,
                                                y = sigY,
                                                vjust = 1)
grid.newpage()
grid.draw(heat)

Here are a few outputs:

original heatmap: original heatmap

ok labels: ok labels

ok labels, with flags: ok labels, with flags

too many labels too many labels

too many labels, with flags too many labels, with flags

The "with flags" outputs are the desired final results.
I just saved these as images from the Rstudio plot viewer. I recognize that I could save them as pdfs and provide a larger file size to get rid of the label overlap, but then the individual cells would be larger than I want.

like image 417
Qwfqwf Avatar asked Oct 01 '18 21:10

Qwfqwf


1 Answers

Based on your code, you seem fairly comfortable with gtables & grobs. A (relatively) straightforward way to achieve the look you want is to zoom in on the row label grob, & make some changes there:

  1. replace unwanted labels with "";
  2. evenly spread out labels within the available space;
  3. add line segments joining the old and new label positions.

I wrote a wrapper function for this, which works as follows:

# heat refers to the original heatmap produced from the pheatmap() function
# kept.labels should be a vector of labels you wish to show
# repel.degree is a number in the range [0, 1], controlling how much the
# labels are spread out from one another

add.flag(heat,
         kept.labels = sigGenes_v,
         repel.degree = 0)

add.flag(heat,
         kept.labels = sigGenes_v,
         repel.degree = 0.5)

add.flag(heat,
         kept.labels = sigGenes_v,
         repel.degree = 1)

plot

Function (explanations in annotations):

add.flag <- function(pheatmap,
                     kept.labels,
                     repel.degree) {

  # repel.degree = number within [0, 1], which controls how much 
  #                space to allocate for repelling labels.
  ## repel.degree = 0: spread out labels over existing range of kept labels
  ## repel.degree = 1: spread out labels over the full y-axis

  heatmap <- pheatmap$gtable

  new.label <- heatmap$grobs[[which(heatmap$layout$name == "row_names")]] 

  # keep only labels in kept.labels, replace the rest with ""
  new.label$label <- ifelse(new.label$label %in% kept.labels, 
                            new.label$label, "")

  # calculate evenly spaced out y-axis positions
  repelled.y <- function(d, d.select, k = repel.degree){
    # d = vector of distances for labels
    # d.select = vector of T/F for which labels are significant

    # recursive function to get current label positions
    # (note the unit is "npc" for all components of each distance)
    strip.npc <- function(dd){
      if(!"unit.arithmetic" %in% class(dd)) {
        return(as.numeric(dd))
      }

      d1 <- strip.npc(dd$arg1)
      d2 <- strip.npc(dd$arg2)
      fn <- dd$fname
      return(lazyeval::lazy_eval(paste(d1, fn, d2)))
    }

    full.range <- sapply(seq_along(d), function(i) strip.npc(d[i]))
    selected.range <- sapply(seq_along(d[d.select]), function(i) strip.npc(d[d.select][i]))

    return(unit(seq(from = max(selected.range) + k*(max(full.range) - max(selected.range)),
                    to = min(selected.range) - k*(min(selected.range) - min(full.range)), 
                    length.out = sum(d.select)), 
                "npc"))
  }
  new.y.positions <- repelled.y(new.label$y,
                                d.select = new.label$label != "")
  new.flag <- segmentsGrob(x0 = new.label$x,
                           x1 = new.label$x + unit(0.15, "npc"),
                           y0 = new.label$y[new.label$label != ""],
                           y1 = new.y.positions)

  # shift position for selected labels
  new.label$x <- new.label$x + unit(0.2, "npc")
  new.label$y[new.label$label != ""] <- new.y.positions

  # add flag to heatmap
  heatmap <- gtable::gtable_add_grob(x = heatmap,
                                   grobs = new.flag,
                                   t = 4, 
                                   l = 4
  )

  # replace label positions in heatmap
  heatmap$grobs[[which(heatmap$layout$name == "row_names")]] <- new.label

  # plot result
  grid.newpage()
  grid.draw(heatmap)

  # return a copy of the heatmap invisibly
  invisible(heatmap)
}
like image 99
Z.Lin Avatar answered Nov 20 '22 02:11

Z.Lin