I've got a huge data set with six columns (call them A, B, C, D, E, F), about 450,000 rows. I simply tried to find the correlation between columns A
and B
:
cor(A, B)
and I got
[1] NA
as a result. What can I do to fix this problem?
Try cor(A,B, use = "pairwise.complete.obs")
. That will ignore the NAs in your observations.
To be statistically rigorous, you should also look at the # of missing entries in your data and look at whether the missing at random assumption holds.
Edit 1: Take a look at ?cor
to see other options for the use
parameter.
You might consider using the rcorr function in the Hmisc package.
It is very fast, and only includes pairwise complete observations. The returned object contains a matrix
Some example code is available here:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With