Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Pearson correlation output is NaN?

I'm trying to get the Pearson correlation coefficient between to variables in R. This is the scatterplot of the variables:

ggplot(results_summary, aes(x =D_in, y = D_ex)) + geom_point(col=ifelse(results_summary$FDR < 0.05, ifelse(results_summary$logF>0, "red", "green" ), "black"))

enter image description here

As you can see, the variables correlate pretty well, so I'm expecting a high correlation coefficient. However when I try to get the Pearson correlation coefficient I'm getting a NaN!

> cor(results_summary$D_in, results_summary$D_ex, method="spearman")
[1] 0.868079
> cor(results_summary$D_in, results_summary$D_ex, method="kendall")
[1] 0.6973086
> cor(results_summary$D_in, results_summary$D_ex, method="pearson")
[1] NaN

I checked if my data contains any NaN:

> nrow(subset(results_summary, is.nan(results_summary$D_ex)==TRUE)) 
[1] 0
> nrow(subset(results_summary, is.nan(results_summary$D_in)==TRUE)) 
[1] 0
> cor(results_summary$D_in, results_summary$D_ex, method="pearson", use="complete.obs")
[1] NaN

But it's seems that is not the reason of the resulting NaN. Can some one give any clue about what is might happening here?

Thanks for your time!

like image 705
Geparada Avatar asked Aug 06 '15 11:08

Geparada


People also ask

What does it mean when correlation is NaN?

The NaN, in this case, is interpreted as no correlation between the two variables.

What does it mean when the value of the Pearson correlation analysis is 0?

The Pearson correlation measures the strength of the linear relationship between two variables. It has a value between -1 to 1, with a value of -1 meaning a total negative linear correlation, 0 being no correlation, and + 1 meaning a total positive correlation.

Why does Pearson correlation assume normality?

Pearson's correlation is a measure of the linear relationship between two continuous random variables. It does not assume normality although it does assume finite variances and finite covariance. When the variables are bivariate normal, Pearson's correlation provides a complete description of the association.

How do you know if Pearson correlation is not significant?

If the test shows that the population correlation coefficient ρ is close to zero, then we say there is insufficient statistical evidence that the correlation between the two variables is significant, i.e., the correlation occurred on account of chance coincidence in the sample and it's not present in the entire ...


1 Answers

That seems odd. My guess is that there is some problem with the input data (which was not revealed by the check you mentioned). I suggest you running:

any(!is.finite(results_summary$D_in))

any(!is.finite(results_summary$D_ex))

You could also try calculating Pearson's correlation by hand, to try to get some insight on where the problem is (in the numerator and/or denominator?):

pearson_num = cov(results_summary$D_in, results_summary$D_ex, use="complete.obs")

pearson_den = c(sd(results_summary$D_in), sd(results_summary$D_ex))

like image 52
tguzella Avatar answered Oct 29 '22 03:10

tguzella