Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to assign your color scale on raw data in heatmap.2()

Tags:

plot

r

I have data that looks like this:

                         Name    h1    h2    h3    h4    h5
1            1420468_at_Asb17 0.000 2.328 0.000 0.000 0.000
2    1430261_at_1700024J04Rik 1.236 2.050 0.000 0.000 0.000
3           1431788_at_Fabp12 0.000 2.150 0.000 0.000 0.587
4    1433187_at_B230112I24Rik 0.000 2.240 1.343 0.000 1.383
5        1434430_s_at_Adora2b 0.000 2.006 1.459 0.000 1.272
6           1435217_at_Gm7969 0.727 2.350 1.494 0.976 0.000
7          1436717_x_at_Hbb-y 0.000 2.712 0.000 0.000 0.000
8            1440859_at_Akap6 0.000 2.053 0.000 0.000 1.840
9              1442625_at_--- 0.000 2.064 1.173 0.000 1.035
10           1443715_at_Rbm24 0.969 2.219 0.000 0.000 0.000
11             1445520_at_--- 0.000 2.497 0.000 0.000 0.000
12          1446035_at_Gm7173 0.000 3.869 0.000 0.000 0.000
13   1446597_at_6820445E23Rik 1.000 2.000 0.000 0.000 0.000
14          1448925_at_Twist2 0.000 2.089 0.938 0.000 0.000
15        1449711_at_Atp6v1e1 0.605 2.363 2.350 1.094 0.976
16          1455931_at_Chrna3 0.000 2.354 0.000 0.000 0.000
17 1457647_x_at_1600023N17Rik 0.000 2.734 0.000 0.000 1.812
18             1458975_at_--- 0.000 2.079 0.000 0.000 0.000
19             1459862_at_--- 0.727 2.606 0.000 0.000 1.151

Note in this data (and the actual one) there is no negative values and the positive values can be as large as 100 or so.

What I want to do is to plot heat map with my own assigned color scale and scheme:

  1. When the value is 0 set it into white.
  2. When the value is == 1 set it into black.
  3. When the value is > 1 set it into shade of reds.
  4. When the value is < 1 and > 0 set it into shade of greens.

also without using any data scaling or built-in z-score transformation. How can I achieve that?

My current code is this:

library(gplots)

# Read data
dat <- read.table("http://dpaste.com/1501148/plain/",sep="\t",header=T);
rownames(dat) <- dat$Name
dat <- dat[,!names(dat) %in% c("Name")]

# Clustering and distance measure functions
hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) dist(x,method="maximum")

#  Define colours
hmcols <- rev(redgreen(2750));

# Plot 
pdf("~/Desktop/tmp.pdf",height=10)
heatmap.2(as.matrix(dat),Colv=FALSE,dendrogram="row",scale="row",col=hmcols,trace="none", margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0),keysize=1);
dev.off()

Which produces the following plot, where it uses the default z-score row scaling.

enter image description here

like image 847
neversaint Avatar asked Dec 12 '13 05:12

neversaint


2 Answers

The key here is understanding that heatmap.2 uses the col argument in combination with the breaks argument.

Take a look at the code and figure below to see what I mean.

library(gplots)
set.seed(100)
dat = matrix( rexp(25,1/2), ncol=5 )
breaks = 0:5
col = c("green","blue","red","yellow","brown")
heatmap.2( dat, breaks=breaks, col=col )

enter image description here

As you can see, there must be n-1 colors for n breaks. For your particular question, the problem is to map the correct colors to the breaks. I'm using the scale="none" option as @josilber pointed out.

breaks = seq(0,max(dat),length.out=1000)
gradient1 = colorpanel( sum( breaks[-1]<=1 ), "white", "green", "black" )
gradient2 = colorpanel( sum( breaks[-1]>1 ), "black", "red" )
hm.colors = c(gradient1,gradient2)

heatmap.2(as.matrix(dat),scale="none",breaks=breaks,col=hm.colors,
          Colv=FALSE,dendrogram="row",trace="none", 
          margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0))

enter image description here

Another alternative would be to have two gradients: green->black and black->red. Then, you could manually set the zero values to white by making them NA and setting na.color="white".

breaks = seq(0,max(dat),length.out=1000)
gradient1 = colorpanel( sum( breaks[-1]<=1 ), "green", "black" )
gradient2 = colorpanel( sum( breaks[-1]>1 ), "black", "red" )
hm.colors = c(gradient1,gradient2)

dat[dat==0] = NA
heatmap.2(as.matrix(dat),scale="none",breaks=breaks,col=hm.colors,na.color="white",
          Colv=FALSE,dendrogram="row",trace="none", 
          margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0))

And finally, you could just manually edit the gradient for the zero values.

breaks = seq(0,max(dat),length.out=1000)
gradient1 = colorpanel( sum( breaks[-1]<=1 ), "green", "black" )
gradient2 = colorpanel( sum( breaks[-1]>1 ), "black", "red" )
hm.colors = c(gradient1,gradient2)
hm.colors[1] = col2hex("white")

heatmap.2(as.matrix(dat),scale="none",breaks=breaks,col=hm.colors,na.color="white",
          Colv=FALSE,dendrogram="row",trace="none", 
          margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0))

enter image description here

Log fold changes

On another note, it appears that you might be looking at fold changes or some type of ratio. It is fairly common to plot the log fold changes when making a heat map. I "greyed" out the zero values.

dat[dat==0] = NA
heatmap.2( as.matrix(log2(dat)), col=greenred(100), 
           scale="none", na.color="grey",symbreaks=TRUE,
           Colv=FALSE,dendrogram="row",trace="none", 
           margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0))

enter image description here

For an explanation of @josilber's nice solution:

This code hmcols <- c(colfunc1(200), colfunc2(200*(max(dat) - 1))) makes a character vector of length 774 (seen by length(hmcols)). Thus, this means that there should be 775 breaks defined. The heatmap.2 function by default makes n+1 breaks where n is the length of the vector used in the col argument. So the number of breaks and colors is worked out, but how does hmcols <- c(colfunc1(200), colfunc2(200*(max(dat) - 1))) map the colors to the breaks correctly? The trick is in clever way that the hmcols vector was created. The number of colors in the first gradient is 200. Since breaks was not explicitly defined, we know that the breaks will be evenly spaced. Since the first gradient goes from 0 to 1 and there are 200 breaks, the width of each break should be 0.005 (or 1/200). Since the second gradient goes from 1 to 3.869 (max(dat)), there should be 2.869/0.005=573.8 breaks (574 breaks when rounding up). Note that the 200*(max(dat) - 1)) does this calculation; it outputs 573.8. Thus, there are then 200+574 colors mapped to the correct breaks and everything works!

like image 174
kdauria Avatar answered Nov 03 '22 11:11

kdauria


I think there are two things here. The first is how to get rid of the z-scores. This can be done with the scale="none" parameter to heatmap.2.

The other question surrounds your desired gradient. I relied on colorRampPalette for this part. Below, I construct a gradient that goes white -> green -> black for values 0 through 1 and then goes black -> red for values 1 -> max(dat).

library(gplots)

# Read data
dat <- read.table("http://dpaste.com/1501148/plain/",sep="\t",header=T);
rownames(dat) <- dat$Name
dat <- dat[,!names(dat) %in% c("Name")]

# Clustering and distance measure functions
hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) dist(x,method="maximum")

#  Define colours
colfunc1 <- colorRampPalette(c("white", "green", "black"))
colfunc2 <- colorRampPalette(c("black", "red"))
hmcols <- c(colfunc1(200), colfunc2(200*(max(dat) - 1)))

# Plot 
pdf("~/Desktop/tmp.pdf",height=10)
heatmap.2(as.matrix(dat),Colv=FALSE,dendrogram="row",scale="none",col=hmcols,trace="none", margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0),keysize=1);
dev.off()

enter image description here

like image 29
josliber Avatar answered Nov 03 '22 11:11

josliber