I am trying to create a heatmap using the heatmap.2
package. My data has lot's of NaN
values in it, and what I would like to do is the following. Every time there is a NaN
value, simply have the cell be colored as light grey (or some other neutral color, maybe white), and all of the other values (which are log2 expression) to have a standard green/yellow/red coloring scheme. Here is my code that I have ben using:
heatmap.2(as.matrix(foo2[rowSums (abs(foo2)) != 0,]),
col = redgreen,
margins = c(12, 22),
trace = "none",
xlab = "Comparison",
lhei = c(2, 8),
scale = c("none"),
symbreaks = min(foo2 = 0, na.rm = TRUE),
na.color = "blue",
cexRow = 0.5,
cexCol = .7,
main = "DE geness",
Colv = F)
This works well when there is no NaN
values, but when the data has NaN
, I am getting an error which says:
Error in hclustfun(distfun(x)) :
NA/NaN/Inf in foreign function call (arg 11)
Essentially, I would like to have it ignore the NaN
's in the data. I am not sure how to handle this. any help would be greatly appreciated.
The key function for both the approaches to visualize missing data is to use Pandas isna() function to find if each element in the dataframe is a missing value or not. By using isna() on Pandas dataframe, we get a boolean dataframe with True for missing data and False for the NOT missing data.
The color scale is scaled so the darkest color corresponds to the lowest numerical value in the array, and the brightest color corresponds to the highest numerical value in the array. Average-expression numbers are white.
Three main types of input exist to plot a heatmap: wide format, correlation matrix, and long format.
TL;DR: The issue is likely due to delegated distfun
and not the heatmap2
function itself. The default dist
function tries to calculate the distance between your data points, and if the distance calculation returns an NA, the clustering function cannot handle that.
The longer version:
I have recently experienced the same issue as the OP, and had to dig in quite a bit to understand why the problem wasn't reproducible for others.
The essential issue is as follows: heatmap2 by default passes hclust
and hclustfun
and dist
as distfun
parameters. The error message clearly states that it's hclustfun
(which in this case defaults to hclust
) that does not like the NA
s.
The next bit of information is this: even though the data matrix includes NA
s the results of dist
(which are passed in to hclust
) might be free from NAs, which is the case for @kdauria's answer. See below:
> library(gplots)
> mat = matrix( rnorm(25), 5, 5)
> mat[c(1,6,8,11,15,20,22,24)] = NaN
>
> heatmap.2( mat,
+ col = colorpanel(100,"red","yellow","green"),
+ margins = c(12, 22),
+ trace = "none",
+ xlab = "Comparison",
+ lhei = c(2, 8),
+ scale = c("none"),
+ symbreaks = min(mat, na.rm=TRUE),
+ na.color="blue",
+ cexRow = 0.5, cexCol = 0.7,
+ main = "DE genes",
+ dendrogram = "row",
+ Colv = FALSE )
> ?dist
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] NaN NaN NaN -1.10103187 -1.4396185
[2,] -0.8821449 1.4891180 0.41956063 -0.06442867 NaN
[3,] -2.5912928 NaN -0.56603029 -0.55177559 -2.0313602
[4,] 0.8348197 0.2199583 0.06318663 1.59697764 NaN
[5,] -0.2632078 -1.2193110 NaN NaN 0.8618543
> dist(mat)
1 2 3 4
2 2.317915
3 1.276559 2.623637
4 6.032933 3.050821 5.283828
5 5.146250 4.392798 5.871684 2.862324
The random valued matrix does not reproduce the problem because it avoids the issue at hand. Which brings me to the question: what does it take to get NAs from dist
?
My data had some outlying large values which I thought to be the reason, however I only managed to reproduce the problem by adding a row of NAs:
> mat = matrix(rnorm(49), 7, 7)
> mat[c(3,17,28, 41)] = mat[c(3,17,28, 41)] * 100000
> mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] -6.175928e-01 1.68691561 -1.233250e+00 -7.355322e-01 -0.37392178 3.559804e-01 1.7536137
[2,] 6.680429e-01 0.90590237 -1.375424e+00 5.842512e-01 -0.09376548 -3.556098e-01 -1.2926535
[3,] -3.739372e+04 -1.74534887 -2.241643e+05 -2.209226e-01 -0.86769435 -4.590908e-01 1.6306854
[4,] -1.283405e+00 0.20698245 3.635557e-01 3.673208e-01 -0.12339047 1.119922e+00 0.4301094
[5,] -5.430687e-02 -0.75219479 2.609126e+00 -1.340564e-01 0.54016622 2.885021e-01 0.9237946
[6,] -8.395116e-01 0.03675002 2.455545e+00 4.432025e-02 -0.86194910 1.302758e+05 0.6062505
[7,] 1.817036e-01 -1.46137388 -1.853179e+00 -2.177306e+03 2.36763806 -2.273134e+00 1.2440088
> dist(mat)
1 2 3 4 5 6
2 3.726858e+00
3 2.272605e+05 2.272606e+05
4 2.966078e+00 3.537475e+00 2.272620e+05
5 4.787577e+00 5.039154e+00 2.272644e+05 3.016614e+00
6 1.302754e+05 1.302762e+05 2.619559e+05 1.302747e+05 1.302755e+05
7 2.176576e+03 2.177895e+03 2.272705e+05 2.177679e+03 2.177179e+03 1.302963e+05
> mat = rbind(mat[1:4, ], rep(NA,7), mat[5:6, ])
> mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] -6.175928e-01 1.68691561 -1.233250e+00 -0.73553223 -0.37392178 3.559804e-01 1.7536137
[2,] 6.680429e-01 0.90590237 -1.375424e+00 0.58425125 -0.09376548 -3.556098e-01 -1.2926535
[3,] -3.739372e+04 -1.74534887 -2.241643e+05 -0.22092261 -0.86769435 -4.590908e-01 1.6306854
[4,] -1.283405e+00 0.20698245 3.635557e-01 0.36732078 -0.12339047 1.119922e+00 0.4301094
[5,] NA NA NA NA NA NA NA
[6,] -5.430687e-02 -0.75219479 2.609126e+00 -0.13405635 0.54016622 2.885021e-01 0.9237946
[7,] -8.395116e-01 0.03675002 2.455545e+00 0.04432025 -0.86194910 1.302758e+05 0.6062505
> dist(mat)
1 2 3 4 5 6
2 3.726858e+00
3 2.272605e+05 2.272606e+05
4 2.966078e+00 3.537475e+00 2.272620e+05
5 NA NA NA NA
6 4.787577e+00 5.039154e+00 2.272644e+05 3.016614e+00 NA
7 1.302754e+05 1.302762e+05 2.619559e+05 1.302747e+05 NA 1.302755e+05
> heatmap.2( mat,
+ col = colorpanel(100,"red","yellow","green"),
+ margins = c(12, 22),
+ trace = "none",
+ xlab = "Comparison",
+ lhei = c(2, 8),
+ scale = c("none"),
+ symbreaks = min(mat, na.rm=TRUE),
+ na.color="blue",
+ cexRow = 0.5, cexCol = 0.7,
+ main = "DE genes",
+ dendrogram = "row",
+ Colv = FALSE )
Error in hclustfun(distfun(x)) :
NA/NaN/Inf in foreign function call (arg 11)
However the situation does not appear to be specific to the case where there is a row entirely composed of NAs. For example:
> mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] NaN NaN NaN NaN NA -7.531027e-01 0.2238252
[2,] 3.210084e-01 -1.55702840 2.777516e-01 0.2176875 1.3310334 -9.621561e-01 NaN
[3,] 1.159837e+05 0.04480172 -1.649482e+04 NaN 2.4748122 8.446133e-01 -0.4240776
[4,] -8.584051e-01 NaN NaN 1.0557713 -1.0855826 -5.638023e-02 -0.3789979
[5,] NA NA -2.539003e-01 -0.4552776 0.3856384 NA NA
[6,] NaN 1.31986556 NaN -1.0393147 -1.9197183 -1.434064e+00 0.6334569
[7,] NaN -0.42180912 NaN -0.8023476 -0.8264077 4.471358e+04 0.5046408
> dist(mat)
1 2 3 4 5 6
2 5.531033e-01
3 3.225471e+00 1.386143e+05
4 1.723619e+00 3.913983e+00 1.534332e+05
5 NA 1.949799e+00 3.085851e+04 3.945524e+00
6 1.486699e+00 6.010961e+00 6.905415e+00 3.743585e+00 4.449179e+00
7 8.365286e+04 5.915178e+04 5.914939e+04 5.915058e+04 2.358664e+00 5.290752e+04
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With