I'm trying to get the Pearson correlation coefficient between to variables in R. This is the scatterplot of the variables:
ggplot(results_summary, aes(x =D_in, y = D_ex)) + geom_point(col=ifelse(results_summary$FDR < 0.05, ifelse(results_summary$logF>0, "red", "green" ), "black"))
As you can see, the variables correlate pretty well, so I'm expecting a high correlation coefficient. However when I try to get the Pearson correlation coefficient I'm getting a NaN!
> cor(results_summary$D_in, results_summary$D_ex, method="spearman")
[1] 0.868079
> cor(results_summary$D_in, results_summary$D_ex, method="kendall")
[1] 0.6973086
> cor(results_summary$D_in, results_summary$D_ex, method="pearson")
[1] NaN
I checked if my data contains any NaN:
> nrow(subset(results_summary, is.nan(results_summary$D_ex)==TRUE))
[1] 0
> nrow(subset(results_summary, is.nan(results_summary$D_in)==TRUE))
[1] 0
> cor(results_summary$D_in, results_summary$D_ex, method="pearson", use="complete.obs")
[1] NaN
But it's seems that is not the reason of the resulting NaN. Can some one give any clue about what is might happening here?
Thanks for your time!
The NaN, in this case, is interpreted as no correlation between the two variables.
The Pearson correlation measures the strength of the linear relationship between two variables. It has a value between -1 to 1, with a value of -1 meaning a total negative linear correlation, 0 being no correlation, and + 1 meaning a total positive correlation.
Pearson's correlation is a measure of the linear relationship between two continuous random variables. It does not assume normality although it does assume finite variances and finite covariance. When the variables are bivariate normal, Pearson's correlation provides a complete description of the association.
If the test shows that the population correlation coefficient ρ is close to zero, then we say there is insufficient statistical evidence that the correlation between the two variables is significant, i.e., the correlation occurred on account of chance coincidence in the sample and it's not present in the entire ...
That seems odd. My guess is that there is some problem with the input data (which was not revealed by the check you mentioned). I suggest you running:
any(!is.finite(results_summary$D_in))
any(!is.finite(results_summary$D_ex))
You could also try calculating Pearson's correlation by hand, to try to get some insight on where the problem is (in the numerator and/or denominator?):
pearson_num = cov(results_summary$D_in, results_summary$D_ex, use="complete.obs")
pearson_den = c(sd(results_summary$D_in), sd(results_summary$D_ex))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With