I have data that looks like this:
Name h1 h2 h3 h4 h5
1 1420468_at_Asb17 0.000 2.328 0.000 0.000 0.000
2 1430261_at_1700024J04Rik 1.236 2.050 0.000 0.000 0.000
3 1431788_at_Fabp12 0.000 2.150 0.000 0.000 0.587
4 1433187_at_B230112I24Rik 0.000 2.240 1.343 0.000 1.383
5 1434430_s_at_Adora2b 0.000 2.006 1.459 0.000 1.272
6 1435217_at_Gm7969 0.727 2.350 1.494 0.976 0.000
7 1436717_x_at_Hbb-y 0.000 2.712 0.000 0.000 0.000
8 1440859_at_Akap6 0.000 2.053 0.000 0.000 1.840
9 1442625_at_--- 0.000 2.064 1.173 0.000 1.035
10 1443715_at_Rbm24 0.969 2.219 0.000 0.000 0.000
11 1445520_at_--- 0.000 2.497 0.000 0.000 0.000
12 1446035_at_Gm7173 0.000 3.869 0.000 0.000 0.000
13 1446597_at_6820445E23Rik 1.000 2.000 0.000 0.000 0.000
14 1448925_at_Twist2 0.000 2.089 0.938 0.000 0.000
15 1449711_at_Atp6v1e1 0.605 2.363 2.350 1.094 0.976
16 1455931_at_Chrna3 0.000 2.354 0.000 0.000 0.000
17 1457647_x_at_1600023N17Rik 0.000 2.734 0.000 0.000 1.812
18 1458975_at_--- 0.000 2.079 0.000 0.000 0.000
19 1459862_at_--- 0.727 2.606 0.000 0.000 1.151
Note in this data (and the actual one) there is no negative values and the positive values can be as large as 100 or so.
What I want to do is to plot heat map with my own assigned color scale and scheme:
also without using any data scaling or built-in z-score transformation. How can I achieve that?
My current code is this:
library(gplots)
# Read data
dat <- read.table("http://dpaste.com/1501148/plain/",sep="\t",header=T);
rownames(dat) <- dat$Name
dat <- dat[,!names(dat) %in% c("Name")]
# Clustering and distance measure functions
hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) dist(x,method="maximum")
# Define colours
hmcols <- rev(redgreen(2750));
# Plot
pdf("~/Desktop/tmp.pdf",height=10)
heatmap.2(as.matrix(dat),Colv=FALSE,dendrogram="row",scale="row",col=hmcols,trace="none", margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0),keysize=1);
dev.off()
Which produces the following plot, where it uses the default z-score row scaling.
The key here is understanding that heatmap.2
uses the col
argument in combination with the breaks
argument.
Take a look at the code and figure below to see what I mean.
library(gplots)
set.seed(100)
dat = matrix( rexp(25,1/2), ncol=5 )
breaks = 0:5
col = c("green","blue","red","yellow","brown")
heatmap.2( dat, breaks=breaks, col=col )
As you can see, there must be n-1
colors for n
breaks. For your particular question, the problem is to map the correct colors to the breaks. I'm using the scale="none"
option as @josilber pointed out.
breaks = seq(0,max(dat),length.out=1000)
gradient1 = colorpanel( sum( breaks[-1]<=1 ), "white", "green", "black" )
gradient2 = colorpanel( sum( breaks[-1]>1 ), "black", "red" )
hm.colors = c(gradient1,gradient2)
heatmap.2(as.matrix(dat),scale="none",breaks=breaks,col=hm.colors,
Colv=FALSE,dendrogram="row",trace="none",
margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0))
Another alternative would be to have two gradients: green->black and black->red. Then, you could manually set the zero values to white by making them NA
and setting na.color="white"
.
breaks = seq(0,max(dat),length.out=1000)
gradient1 = colorpanel( sum( breaks[-1]<=1 ), "green", "black" )
gradient2 = colorpanel( sum( breaks[-1]>1 ), "black", "red" )
hm.colors = c(gradient1,gradient2)
dat[dat==0] = NA
heatmap.2(as.matrix(dat),scale="none",breaks=breaks,col=hm.colors,na.color="white",
Colv=FALSE,dendrogram="row",trace="none",
margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0))
And finally, you could just manually edit the gradient for the zero values.
breaks = seq(0,max(dat),length.out=1000)
gradient1 = colorpanel( sum( breaks[-1]<=1 ), "green", "black" )
gradient2 = colorpanel( sum( breaks[-1]>1 ), "black", "red" )
hm.colors = c(gradient1,gradient2)
hm.colors[1] = col2hex("white")
heatmap.2(as.matrix(dat),scale="none",breaks=breaks,col=hm.colors,na.color="white",
Colv=FALSE,dendrogram="row",trace="none",
margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0))
Log fold changes
On another note, it appears that you might be looking at fold changes or some type of ratio. It is fairly common to plot the log fold changes when making a heat map. I "greyed" out the zero values.
dat[dat==0] = NA
heatmap.2( as.matrix(log2(dat)), col=greenred(100),
scale="none", na.color="grey",symbreaks=TRUE,
Colv=FALSE,dendrogram="row",trace="none",
margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0))
For an explanation of @josilber's nice solution:
This code hmcols <- c(colfunc1(200), colfunc2(200*(max(dat) - 1)))
makes
a character vector of length 774 (seen by length(hmcols)
). Thus, this means that there should be 775 breaks defined. The heatmap.2
function by default makes n+1
breaks where n
is the length of the vector used in the col
argument. So the number of breaks and colors is worked out, but how does hmcols <- c(colfunc1(200), colfunc2(200*(max(dat) - 1)))
map the colors to the breaks correctly? The trick is in clever way that the hmcols
vector was created. The number of colors in the first gradient is 200. Since breaks
was not explicitly defined, we know that the breaks will be evenly spaced. Since the first gradient goes from 0 to 1 and there are 200 breaks, the width of each break should be 0.005 (or 1/200). Since the second gradient goes from 1 to 3.869 (max(dat)
), there should be 2.869/0.005=573.8 breaks (574 breaks when rounding up). Note that the 200*(max(dat) - 1))
does this calculation; it outputs 573.8. Thus, there are then 200+574 colors mapped to the correct breaks and everything works!
I think there are two things here. The first is how to get rid of the z-scores. This can be done with the scale="none"
parameter to heatmap.2
.
The other question surrounds your desired gradient. I relied on colorRampPalette
for this part. Below, I construct a gradient that goes white -> green -> black for values 0 through 1 and then goes black -> red for values 1 -> max(dat)
.
library(gplots)
# Read data
dat <- read.table("http://dpaste.com/1501148/plain/",sep="\t",header=T);
rownames(dat) <- dat$Name
dat <- dat[,!names(dat) %in% c("Name")]
# Clustering and distance measure functions
hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) dist(x,method="maximum")
# Define colours
colfunc1 <- colorRampPalette(c("white", "green", "black"))
colfunc2 <- colorRampPalette(c("black", "red"))
hmcols <- c(colfunc1(200), colfunc2(200*(max(dat) - 1)))
# Plot
pdf("~/Desktop/tmp.pdf",height=10)
heatmap.2(as.matrix(dat),Colv=FALSE,dendrogram="row",scale="none",col=hmcols,trace="none", margin=c(5,10), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0),keysize=1);
dev.off()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With