Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dealing with missing values for correlations calculation

I have huge matrix with a lot of missing values. I want to get the correlation between variables.

1. Is the solution

cor(na.omit(matrix)) 

better than below?

cor(matrix, use = "pairwise.complete.obs") 

I already have selected only variables having more than 20% of missing values.

2. Which is the best method to make sense ?

like image 349
Delphine Avatar asked Sep 16 '11 13:09

Delphine


People also ask

Can you do a correlation with missing values?

The correlation coefficient is easy to estimate with the familiar product-moment estimator. It is also straightforward to construct confidence intervals using the variance stabilizing Fisher transformation. If some data are missing, it is not possible to assess the correlation in the usual way.


1 Answers

I would vote for the second option. Sounds like you have a fair amount of missing data and so you would be looking for a sensible multiple imputation strategy to fill in the spaces. See Harrell's text "Regression Modeling Strategies" for a wealth of guidance on 'how's to do this properly.

like image 119
IRTFM Avatar answered Sep 17 '22 07:09

IRTFM