Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R heatmap with diverging colour palette

I am trying to create a simple heatmap in R, using a diverging colour palette. I want to use a gradient so that all numbers below a threshold N are designated a color (say purple), and all numbers above the threshold are designated another color (say orange). The further away the number is from the threshold, the darker the color should be.

Here is a sample dataset:

Division,COL1,COL2,COL3,COL4,COL5,COL6,COL7
Division 1,31.9221884012222,75.8181694429368,97.0480443444103,96.295954938978,70.5677134916186,63.0451830103993,93.0396212730557
Division 2,85.7012346852571,29.0621076244861,16.9130333233625,94.6443660184741,19.9103083927184,61.9562198873609,72.3791105207056
Division 3,47.1665125340223,99.4153356179595,8.51091076619923,79.1276383213699,41.915355855599,7.45079894550145,24.6946100145578
Division 4,66.0743870772421,24.6163331903517,78.694460215047,42.04714265652,50.2694897353649,73.0409651994705,87.3745442833751
Division 5,29.6664374880493,35.4036891367286,19.2967326845974,5.48460693098605,32.4517334811389,15.5926876701415,76.0523204226047
Division 6,95.4969164915383,8.63230894319713,61.7535551078618,24.5590241160244,25.5453423131257,56.397921172902,44.4693325087428
Division 7,87.5015622004867,28.7770316936076,56.5095080062747,34.6680747810751,28.1923673115671,65.0204187724739,13.795713102445
Division 8,70.1077231671661,72.4712177179754,38.4903231170028,36.1821102909744,97.0875509083271,17.184783378616,78.2292529474944
Division 9,47.3570406902581,90.2257485780865,65.6037972308695,77.0234781783074,25.6294377148151,84.900529962033,82.5080851092935
Division 10,58.0811711959541,0.493217632174492,58.5604055318981,53.5780876874924,9.12552657537162,20.313960686326,78.1371118500829
Division 11,34.6708688884974,76.711881859228,22.6064443588257,22.1724311355501,5.48891355283558,79.1159523651004,56.8405059166253
Division 12,33.6812808644027,44.1363711375743,70.6362190190703,3.78900407813489,16.6075889021158,9.12654218263924,39.9711143691093

Here is a simple snippet to produce a heatmap from the above data

data <- read.csv("dataset.csv", sep=",")
row.names(data) <- data$Division
data <- data[,2:7]
data_matrix <- data.matrix(data) 
heatmap(data_matrix, Rowv=NA, Colv=NA, col = heat.colors(256), scale="column", margins=c(5,10))

How can I modify the above code to produce:

  • a color gradient (orange) for all numbers ABOVE 50 (darker the further the number is from 50)
  • a color gradient (purple) for all numbers BELOW 50 (darker the further the number is from 50)
  • Nice to have (but optional) write the number value in the grid cell
  • Nice to have (but optional), use a different color for grid cell that is EXACTLY the threshold number (50 in this case)

[[Edit]]

I have just seen this question on SO, which seems to be very similar. The answer uses ggplot (which I have no experience of), and I have so far, been unable to adapt the ggplot solution to my slightly more complicated data.

like image 204
Homunculus Reticulli Avatar asked Dec 12 '22 01:12

Homunculus Reticulli


2 Answers

This should get you most of the way. (Note that you'll need to set scale="none" if you want the plotted colors to correspond to the actual (rather than the rescaled) values of the cells).

ncol <- 100

## Make a vector with n colors
cols <- RColorBrewer:::brewer.pal(11,"PuOr")  # OR c("purple","white","orange")  
rampcols <- colorRampPalette(colors = cols, space="Lab")(ncol)
rampcols[(n/2) + 1] <- rgb(t(col2rgb("green")), maxColorValue=256) 

## Make a vector with n+1 breaks
rampbreaks <- seq(0, 100, length.out = ncol+1)

## Try it out
heatmap(data_matrix, Rowv = NA, Colv = NA, scale="none",
        col = rampcols, breaks = rampbreaks)

enter image description here

EDIT

For finer control over the placement of the threshold, I'd suggest creating two separate palettes -- one for values less than the threshold and one for values above the threshold -- and then "suturing" them together. Try something like this, playing around with different values for Min, Max, Thresh, etc.:

nHalf <- 50

Min <- 0
Max <- 100
Thresh <- 50

## Make vector of colors for values below threshold
rc1 <- colorRampPalette(colors = c("purple", "white"), space="Lab")(nHalf)    
## Make vector of colors for values above threshold
rc2 <- colorRampPalette(colors = c("white", "orange"), space="Lab")(nHalf)
rampcols <- c(rc1, rc2)
## In your example, this line sets the color for values between 49 and 51. 
rampcols[c(nHalf, nHalf+1)] <- rgb(t(col2rgb("green")), maxColorValue=256) 

rb1 <- seq(Min, Thresh, length.out=nHalf+1)
rb2 <- seq(Thresh, Max, length.out=nHalf+1)[-1]
rampbreaks <- c(rb1, rb2)

heatmap(data_matrix, Rowv = NA, Colv = NA, scale="none",
        col = rampcols, breaks = rampbreaks)
like image 105
Josh O'Brien Avatar answered Jan 06 '23 19:01

Josh O'Brien


I found this thread very useful and also pulled some ideas from here, but for my purposes I needed to generalize some things and wanted to use the RColorBrewer package. While I was working on it Dr. Brewer (of Color Brewer fame) stopped in my office and told me I needed to interpolate within the smaller color breaks rather than just pick the end points. I thought others might find this useful so I am posting my function here for posterity.

The function takes in your data vector, the name of a diverging colorBrewer palette, and the center point for your color scheme (default is 0). It outputs a list containing 2 objects: a classIntervals object and a vector of colors: The function is set to interpolate a total of 100 colors but that can be modified with some care.

diverge.color <- function(data,pal_choice="RdGy",centeredOn=0){
  nHalf=50
  Min <- min(data,na.rm=TRUE)
  Max <- max(data,na.rm=TRUE)
  Thresh <- centeredOn
  pal<-brewer.pal(n=11,pal_choice)
  rc1<-colorRampPalette(colors=c(pal[1],pal[2]),space="Lab")(10)
  for(i in 2:10){
    tmp<-colorRampPalette(colors=c(pal[i],pal[i+1]),space="Lab")(10)
    rc1<-c(rc1,tmp)
  }
  rb1 <- seq(Min, Thresh, length.out=nHalf+1)
  rb2 <- seq(Thresh, Max, length.out=nHalf+1)[-1]
  rampbreaks <- c(rb1, rb2)
  cuts <- classIntervals(data, style="fixed",fixedBreaks=rampbreaks)
  return(list(cuts,rc1))
}

in my work I am using this scheme to plot a raster layer (rs) using spplot like so:

brks<-diverge.color(values(rs))
spplot(rs,col.regions=brks[[2]],at=brks[[1]]$brks,colorkey=TRUE))
like image 31
csfowler Avatar answered Jan 06 '23 19:01

csfowler