Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

generating a heatmap using R or Python

Tags:

r

So my problem may not be suited for SO. But I am looking for a solution (in R, Python mainly, prefer R) to create heatmaps for data that has two extreme ends. Consider the following data.

+----+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
| …  |     X1      |     X2      |     X3      |     X4      |     X5      |     X6      |     X7      |     X8      |     X9      |     X10     |     X11     |     X12     |
+----+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
|  1 | 0.960023745 | 0.006412462 | 0.002413886 | 1.75E-06    | 1.33E-07    | 6.53E-07    | 0.000789362 | 1.56E-07    | 0.027248026 | 2.54E-05    | 0.000108822 | 0.002949816 |
|  2 | 0.013783554 | 0.960582857 | 0.010711838 | 0.003933983 | 0.002573642 | 0.001472307 | 0.000319789 | 0.000195265 | 1.87E-05    | 1.29E-06    | 0.004194081 | 0.002209041 |
|  3 | 0.000839561 | 0.005466858 | 0.944159921 | 0.023892784 | 0.001752099 | 0.000828122 | 0.000493376 | 1.84E-06    | 0.011739846 | 0.000879784 | 9.53E-05    | 0.00980562  |
|  4 | 2.26E-08    | 0.004108291 | 0.010781282 | 0.966410413 | 0.010459999 | 3.04E-05    | 1.64E-06    | 0.001983494 | 0           | 0.000225223 | 0.002846474 | 0.0031448   |
|  5 | 0           | 0.003175902 | 0.002023363 | 0.010022482 | 0.919020424 | 0.032083951 | 0.001814906 | 0.030203657 | 2.02E-06    | 7.07E-05    | 0.001165208 | 0.000413012 |
|  6 | 7.34E-08    | 0.002817014 | 0.000931738 | 7.01E-05    | 0.026999736 | 0.947850807 | 0.003017895 | 0.017994113 | 0           | 0.00011791  | 0.000194055 | 0           |
|  7 | 0.001857195 | 0.000220267 | 0.001523402 | 1.23E-05    | 0.001915852 | 0.010193007 | 0.960227998 | 0.012040256 | 0.007093175 | 0.001441301 | 0.002149965 | 0.001306157 |
|  8 | 0           | 0.000337953 | 0           | 0.00536237  | 0.030409165 | 0.01670267  | 0.009929247 | 0.936720524 | 0           | 0           | 0.000503316 | 3.12E-05    |
|  9 | 0.00350741  | 2.38E-06    | 0.002294787 | 1.17E-06    | 9.38E-08    | 8.74E-08    | 0.000252812 | 4.25E-10    | 0.984092182 | 0.003173648 | 2.42E-05    | 0.006649569 |
| 10 | 0.000126558 | 4.85E-05    | 0.001686418 | 0.000202837 | 3.87E-05    | 9.82E-05    | 0.000425687 | 0           | 0.013116146 | 0.983428814 | 5.28E-05    | 0.000776452 |
| 11 | 0.000170592 | 0.002728779 | 0.000117028 | 0.002794149 | 0.000621607 | 0.000224662 | 0.000969203 | 0.000299963 | 0.000629235 | 4.68E-05    | 0.991344498 | 5.02E-05    |
| 12 | 0.004371355 | 0.001246307 | 0.02523568  | 0.007498292 | 0.000186287 | 6.00E-07    | 0.000956249 | 2.93E-05    | 0.0590514   | 0.001253133 | 8.40E-05    | 0.900059314 |
+----+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+

Consider the first row. The X1 column entry is a very high number compared to the rest of the entries in that row. This goes for all the rows. The heat map this data generates looks like the following

enter image description here

As you can see, the diagonal is very strong compared to the other colors (and this can be seen from the data and is actually expected). I am just trying to find a way to "darken" up the other colors. I'm mainly looking for a ggplot solution. Anything I've tried dosnt work.

The code for R right now is

heatmap(data.matrix(result_matrix), Rowv=NA, Colv=NA, col = rev(heat.colors(256)), margins=c(5,10))
like image 883
masfenix Avatar asked Mar 19 '26 08:03

masfenix


1 Answers

The basic idea is to put the fill colors on a logarithmic scale. Here is a ggplot solution.

library(ggplot2)
library(reshape2)
df$id <- rownames(df)
gg <- melt(df,id="id")
ggplot(gg, aes(x=variable,y=id,fill=value))+
  geom_tile()+
  scale_fill_gradientn(colours=rev(heat.colors(10)),
                       trans="log10",na.value="white")+
  coord_fixed()+
  scale_x_discrete(expand=c(0,0))+scale_y_discrete(expand=c(0,0))

The key here is trans="log10" in the call to scale_fill_gradientn(...). One problem with logs is that you have zeros in your data, which are transformed to NA. Using na.value="white" deals with that (you could make it another color if that was appropriate in your use case).

The calls to scale_x... and scale_y... are just to compress the axes so the tiles cover the whole plot (ggplot adds a bit of empty space by default which is distracting in heatmaps).

EDIT: Response ot OP's comment.

This business of "making the diagonal pop out more" is an aesthetic choice which has almost nothing to do with the data, and will probably lead to a misleading graphic. I do not recommend it. Having said that, you can always choose a different transformation.

# reorder the y-axis  - should not be necessary
gg$id <- factor(gg$id,levels=unique(gg$id))  # should not be necessary...

# square root scale
ggplot(gg, aes(x=variable,y=id,fill=value))+
  geom_tile()+
  scale_fill_gradientn(colours=rev(heat.colors(10)),
                       trans="sqrt",na.value="white")+
  coord_fixed()+
  scale_x_discrete(expand=c(0,0))+scale_y_discrete(expand=c(0,0))

#logit scale; need to set breaks=... to avoid labels overlapping
ggplot(gg, aes(x=variable,y=id,fill=value))+
  geom_tile()+
  scale_fill_gradientn(colours=rev(heat.colors(10)),
                       trans="logit",na.value="white",breaks=5*10^-(0:8))+
  coord_fixed()+
  scale_x_discrete(expand=c(0,0))+scale_y_discrete(expand=c(0,0))

like image 60
jlhoward Avatar answered Mar 21 '26 21:03

jlhoward