I am trying to do imputation to a medium size dataframe (~100,000 rows) where 5 columns out of 30 have NAs (a large proportion, around 60%).
I tried mice with the following code:
library(mice)
data_3 = complete(mice(data_2))
After the first iteration I got the following exception:
iter imp variable
1 1 Existing_EMI Loan_Amount Loan_Period
Error in solve.default(xtx + diag(pen)): system is computationally singular: reciprocal condition number = 1.08007e-16
Is there some other package that is more robust to this kind of situations? How can I deal with this problem?
Your 5 columns might have a number of unbalanced factors. When these are turned into dummy variables there is a high probability that you will have one column a linear combination of another. The default imputation methods of mice
involve linear regression, this results in a X matrix that cannot be inverted and will result in your error.
Change the method being used to something else like cart -- mice(data_2, method = "cart")
--. Also check which seed you are calling before / during imputation for reproducible results.
My advice is to go through the 7 vignettes of mice. You can find out how to change the method
of imputation being used for separate columns instead of for the whole dataset.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With