Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I make cell size in an heatmap mediate data resolution using R?

Tags:

r

heatmap

Given the following example:

X <- matrix(nrow=3, ncol=3)
X[1,] <- c(0.3, 0.4, 0.45)
X[2,] <- c(0.3, 0.7, 0.65)
X[3,] <- c(0.3, 0.4, 0.45)
colnames(X)<-c(1.5, 3, 4)
rownames(X)<-c(1.5, 3, 4)

A heatmap ( heatmap(X, Rowv=NA, Colv=NA, col=rev(heat.colors(256))) ) will look like:

enter image description here

Now, say that the variables on the axes are parameters affecting some function, the distance between 3 and 4 is smaller than the distance between 1 and 3 and I would like the cell size of the heat map to reflect this. How can I make a heat map where the cell size reflects the resolution of the known data?

I am thinking of something that looks a bit like this:

sketch of what I would like

Do libraries for creating something like this exist? If no, is it because I am missing something? If so, what?

like image 478
jonalv Avatar asked May 17 '13 09:05

jonalv


2 Answers

In a set of functions I personally use, I have a function for drawing two dimensional histograms that you could use. I have included the code below:

#' Plot two dimensional histogram
#'
#' @param hist matrix or two dimensional array containing the number of counts
#' in each of the bins.
#' @param borders_x the x-borders of the bins in the histogram. Should be a
#' numeric vector with lenght one longer than the number of columns of
#' \code{hist}
#' @param borders_y the y-borders of the bins in the histogram. Should be a
#' numeric vector with lenght one longer than the number of rows of
#' \code{hist}
#' @param type a character specifying the type of plot. Valid values are "text",
#' "area" and "color". See details for more information.
#' @param add add the plot to an existing one or create a new plot.
#' @param add_lines logical specifying whether or not lines should be drawn
#' between the bins.
#' @param draw_empty if \code{FALSE} empty bins (numer of counts equal to zero)
#' are not drawn. They are shown using the background color.
#' @param col for types "area" and "text" the color of the boxes and text.
#' @param line_col the color of the lines between the bins.
#' @param background_col the background color of the bins.
#' @param lty the line type of the lines between the bins.
#' @param text_cex the text size used for type "text". See \code{\link{par}} for
#' more information.
#' @param col_range the color scale used for type "color". Should be a function
#' which accepts as first argument the number of colors that should be
#' generated. The first color generated is used for the zero counts; the
#' last color for the highest number of counts in the histogram.
#' @param ... additional arguments are passed on to \code{\link{plot}}.
#'
#' @details
#' There are three plot types: "area", "text", and "color". In case of "area"
#' rectangles are drawn inside the bins with area proportional to the number of
#' counts in the bins. In case of text the number of counts is shown as text in
#' the bins. In case of color a color scale is used (by default heat.colors) to
#' show the number of counts.
#'
#' @seealso \code{\link{image}} which can be used to create plots similar to
#' type "color". \code{\link{contour}} may also be of interest.
#'
#' @examples
#' histplot2(volcano - min(volcano), type="color")
#' histplot2(volcano - min(volcano), add_lines=FALSE, type="area")
#' histplot2(volcano - min(volcano), type="text", text_cex=0.5)
#'
#' @export
histplot2 <- function(hist, borders_x=seq(0, ncol(hist)),
        borders_y=seq(0, nrow(hist)), type="area", add=FALSE, add_lines=TRUE,
        draw_empty=FALSE, col="black", line_col="#00000030",
        background_col="white", lty=1, text_cex=0.6, col_range=heat.colors, ...) {
    # create new plot
    rangex <- c(min(borders_x), max(borders_x))
    rangey <- c(min(borders_y), max(borders_y))
    if (add == FALSE) {
        plot(rangex, rangey, type='n', xaxs='i', yaxs='i', ...)
        rect(rangex[1], rangey[1], rangex[2], rangey[2], col=background_col,
            border=NA)
    }
    # prepare data
    nx <- length(borders_x)-1
    ny <- length(borders_y)-1
    wx <- rep(diff(borders_x), each=ny)
    wy <- rep(diff(borders_y), times=nx)
    sx <- 0.95*min(wx)/sqrt(max(hist))
    sy <- 0.95*min(wy)/sqrt(max(hist))
    x <- rep((borders_x[-length(borders_x)] + borders_x[-1])/2, each=ny)
    y <- rep((borders_y[-length(borders_y)] + borders_y[-1])/2, times=nx)
    h <- as.numeric(hist)
    # plot type "area"
    if (type == "area") {
        dx <- sqrt(h)*sx*0.5
        dy <- sqrt(h)*sy*0.5
        rect(x-dx, y-dy, x+dx, y+dy, col=col, border=NA)
    # plot type "text"
    } else if (type == "text") {
        if (draw_empty) {
            text(x, y, format(h), cex=text_cex, col=col)
        } else {
            text(x[h!=0], y[h!=0], format(h[h!=0]), cex=text_cex, col=col)
        }
    # plot type "color"
    } else if (type == "color" | type == "colour") {
        #h <- h/(wx*wy)
        col <- col_range(200)
        col <- col[floor(h/max(h)*200*(1-.Machine$double.eps))+1]
        sel <- rep(TRUE, length(x))
        if (!draw_empty) sel <- h > 0
        rect(x[sel]-wx[sel]/2, y[sel]-wy[sel]/2, x[sel]+wx[sel]/2,
            y[sel]+wy[sel]/2, col=col[sel], border=NA)
    } else {
        stop("Unknown plot type: options are 'area', 'text' and 'color'.")
    }
    # add_lines
    if (add_lines) {
        lines(rbind(borders_x, borders_x, NA),
            rbind(rep(rangey[1], nx+1), rep(rangey[2], nx+1), NA),
            col=line_col, lty=lty)
        lines(rbind(rep(rangex[1], ny+1), rep(rangex[2], ny+1), NA),
            rbind(borders_y, borders_y, NA), col=line_col, lty=lty)
    }
    # add border
    if (add == FALSE) box()
}

For your example this results in:

X <- matrix(nrow=3, ncol=3)
X[1,] <- c(0.3, 0.4, 0.45)
X[2,] <- c(0.3, 0.7, 0.65)
X[3,] <- c(0.3, 0.4, 0.45)
centers <- c(1.5, 3, 4)

centers_to_borders <- function(centers) {
    nc <- length(centers)
    d0 <- centers[2]-centers[1]
    d1 <- centers[nc]-centers[nc-1]
    c(centers[1]-d0/2, 
      (centers[2:nc] + centers[1:(nc-1)])/2, centers[nc]+d1/2)
}

histplot2(X, centers_to_borders(centers), 
    centers_to_borders(centers), type="color")

graph resulting from code above

Edit

Below is a rough function that creates a color legend:

plot_range <- function(hist, col_range = heat.colors) {
    r <- range(c(0, X))
    par(cex=0.7, mar=c(8, 1, 8, 2.5))
    plot(0, 0, type='n', xlim=c(0,1), ylim=r, xaxs='i',
        yaxs='i', bty='n', xaxt='n', yaxt='n', xlab='', ylab='')
    axis(4)
    y <- seq(r[1], r[2], length.out=200)
    yc <- floor(y/max(y)*5*(1-.Machine$double.eps))+1
    col <- col_range(5)[yc]
    b <- centers_to_borders(y)
    rect(rep(0, length(y)), b[-length(b)], rep(1, length(y)), 
        b[-1], col=col, border=NA)
}

You could add this legend to your plot using layout:

layout(matrix(c(1,2), nrow = 1), widths = c(0.9, 0.1))
par(mar = c(5, 4, 4, 2) + 0.1)
histplot2(X, centers_to_borders(centers), 
    centers_to_borders(centers), type="color")
plot_range(X)

enter image description here

... adjust to your need

Edit 2

In the original code of histplot2 there was a line h <- h/(wx*wy) which I now have commented out. This devided the values of the histogram by the area of the bin, which is often what you want, but probably not in this case.

like image 128
Jan van der Laan Avatar answered Oct 14 '22 11:10

Jan van der Laan


Something like this, maybe?

library(ggplot2)
library(reshape2)

X <- matrix(nrow=3, ncol=3)
X[1,] <- c(0.3, 0.4, 0.45)
X[2,] <- c(0.3, 0.7, 0.65)
X[3,] <- c(0.3, 0.4, 0.45)


colnames(X)<-c(1.5, 3, 4)
rownames(X)<-c(1.5, 3, 4)
X <- melt(X)
X <- as.data.frame(X)
names(X) <- c("Var1", "Var2", "value")
v1m <- unique(X$Var1)
X$Var1.min <- rep(c(0, v1m[-length(v1m)]), length.out = length(v1m))
v2m <- unique(X$Var2)
X$Var2.min <- rep(c(0, v2m[-length(v2m)]), each = length(v2m))

ggplot(data = X, aes(fill = value)) + 
    geom_rect(aes(ymin = Var1.min, ymax = Var1, xmin = Var2.min, xmax = Var2))

graph

like image 38
Noah Avatar answered Oct 14 '22 10:10

Noah