I am trying to create a simple heatmap in R, using a diverging colour palette. I want to use a gradient so that all numbers below a threshold N are designated a color (say purple), and all numbers above the threshold are designated another color (say orange). The further away the number is from the threshold, the darker the color should be.
Here is a sample dataset:
Division,COL1,COL2,COL3,COL4,COL5,COL6,COL7
Division 1,31.9221884012222,75.8181694429368,97.0480443444103,96.295954938978,70.5677134916186,63.0451830103993,93.0396212730557
Division 2,85.7012346852571,29.0621076244861,16.9130333233625,94.6443660184741,19.9103083927184,61.9562198873609,72.3791105207056
Division 3,47.1665125340223,99.4153356179595,8.51091076619923,79.1276383213699,41.915355855599,7.45079894550145,24.6946100145578
Division 4,66.0743870772421,24.6163331903517,78.694460215047,42.04714265652,50.2694897353649,73.0409651994705,87.3745442833751
Division 5,29.6664374880493,35.4036891367286,19.2967326845974,5.48460693098605,32.4517334811389,15.5926876701415,76.0523204226047
Division 6,95.4969164915383,8.63230894319713,61.7535551078618,24.5590241160244,25.5453423131257,56.397921172902,44.4693325087428
Division 7,87.5015622004867,28.7770316936076,56.5095080062747,34.6680747810751,28.1923673115671,65.0204187724739,13.795713102445
Division 8,70.1077231671661,72.4712177179754,38.4903231170028,36.1821102909744,97.0875509083271,17.184783378616,78.2292529474944
Division 9,47.3570406902581,90.2257485780865,65.6037972308695,77.0234781783074,25.6294377148151,84.900529962033,82.5080851092935
Division 10,58.0811711959541,0.493217632174492,58.5604055318981,53.5780876874924,9.12552657537162,20.313960686326,78.1371118500829
Division 11,34.6708688884974,76.711881859228,22.6064443588257,22.1724311355501,5.48891355283558,79.1159523651004,56.8405059166253
Division 12,33.6812808644027,44.1363711375743,70.6362190190703,3.78900407813489,16.6075889021158,9.12654218263924,39.9711143691093
Here is a simple snippet to produce a heatmap from the above data
data <- read.csv("dataset.csv", sep=",")
row.names(data) <- data$Division
data <- data[,2:7]
data_matrix <- data.matrix(data)
heatmap(data_matrix, Rowv=NA, Colv=NA, col = heat.colors(256), scale="column", margins=c(5,10))
How can I modify the above code to produce:
[[Edit]]
I have just seen this question on SO, which seems to be very similar. The answer uses ggplot (which I have no experience of), and I have so far, been unable to adapt the ggplot solution to my slightly more complicated data.
This should get you most of the way. (Note that you'll need to set scale="none"
if you want the plotted colors to correspond to the actual (rather than the rescaled) values of the cells).
ncol <- 100
## Make a vector with n colors
cols <- RColorBrewer:::brewer.pal(11,"PuOr") # OR c("purple","white","orange")
rampcols <- colorRampPalette(colors = cols, space="Lab")(ncol)
rampcols[(n/2) + 1] <- rgb(t(col2rgb("green")), maxColorValue=256)
## Make a vector with n+1 breaks
rampbreaks <- seq(0, 100, length.out = ncol+1)
## Try it out
heatmap(data_matrix, Rowv = NA, Colv = NA, scale="none",
col = rampcols, breaks = rampbreaks)
EDIT
For finer control over the placement of the threshold, I'd suggest creating two separate palettes -- one for values less than the threshold and one for values above the threshold -- and then "suturing" them together. Try something like this, playing around with different values for Min
, Max
, Thresh
, etc.:
nHalf <- 50
Min <- 0
Max <- 100
Thresh <- 50
## Make vector of colors for values below threshold
rc1 <- colorRampPalette(colors = c("purple", "white"), space="Lab")(nHalf)
## Make vector of colors for values above threshold
rc2 <- colorRampPalette(colors = c("white", "orange"), space="Lab")(nHalf)
rampcols <- c(rc1, rc2)
## In your example, this line sets the color for values between 49 and 51.
rampcols[c(nHalf, nHalf+1)] <- rgb(t(col2rgb("green")), maxColorValue=256)
rb1 <- seq(Min, Thresh, length.out=nHalf+1)
rb2 <- seq(Thresh, Max, length.out=nHalf+1)[-1]
rampbreaks <- c(rb1, rb2)
heatmap(data_matrix, Rowv = NA, Colv = NA, scale="none",
col = rampcols, breaks = rampbreaks)
I found this thread very useful and also pulled some ideas from here, but for my purposes I needed to generalize some things and wanted to use the RColorBrewer package. While I was working on it Dr. Brewer (of Color Brewer fame) stopped in my office and told me I needed to interpolate within the smaller color breaks rather than just pick the end points. I thought others might find this useful so I am posting my function here for posterity.
The function takes in your data vector, the name of a diverging colorBrewer palette, and the center point for your color scheme (default is 0). It outputs a list containing 2 objects: a classIntervals object and a vector of colors: The function is set to interpolate a total of 100 colors but that can be modified with some care.
diverge.color <- function(data,pal_choice="RdGy",centeredOn=0){
nHalf=50
Min <- min(data,na.rm=TRUE)
Max <- max(data,na.rm=TRUE)
Thresh <- centeredOn
pal<-brewer.pal(n=11,pal_choice)
rc1<-colorRampPalette(colors=c(pal[1],pal[2]),space="Lab")(10)
for(i in 2:10){
tmp<-colorRampPalette(colors=c(pal[i],pal[i+1]),space="Lab")(10)
rc1<-c(rc1,tmp)
}
rb1 <- seq(Min, Thresh, length.out=nHalf+1)
rb2 <- seq(Thresh, Max, length.out=nHalf+1)[-1]
rampbreaks <- c(rb1, rb2)
cuts <- classIntervals(data, style="fixed",fixedBreaks=rampbreaks)
return(list(cuts,rc1))
}
in my work I am using this scheme to plot a raster layer (rs) using spplot like so:
brks<-diverge.color(values(rs))
spplot(rs,col.regions=brks[[2]],at=brks[[1]]$brks,colorkey=TRUE))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With