I know that similar questions have been asked before (e.g., 1, 2, 3), but I still do not understand the reason why mice
is failing to predict missing values even when I try unconditioned mean like in the example 1.
The sparse matrix I have is:
k1 k3 k5 k6 k7 k8 k11 k12 k13 k14 k15
[1,] NA NA NA NA NA NA NA NA NA NA 0.066667
[2,] 0.909091 NA NA NA NA 0.944723 NA NA 0.545455 NA NA
[3,] 0.545455 NA NA NA NA NA NA NA 0.818182 0.800000 0.466667
[4,] 0.545455 NA 0.642857 NA NA 0.260954 NA NA NA NA NA
[5,] NA 0.750 0.500000 NA 0.869845 NA 0.595013 NA NA NA NA
[6,] 0.727273 0.625 NA 0.583333 NA NA NA 0.500000 0.545455 NA NA
[7,] NA NA 0.571429 NA NA NA NA NA NA NA 0.866667
[8,] 0.545455 NA NA NA NA 0.905593 0.677757 NA NA NA NA
[9,] NA 0.999 0.714286 0.750000 NA NA 0.881032 NA NA 0.933333 0.733333
[10,] NA 0.750 NA NA NA NA NA NA 0.545455 NA NA
[11,] NA NA NA NA NA NA NA NA 0.818182 NA NA
[12,] NA 0.999 NA 0.583333 NA NA 0.986145 0.666667 0.909091 NA NA
[13,] 0.818182 NA 0.857143 0.583333 0.001000 NA NA NA NA 0.133333 NA
[14,] NA 0.999 0.357143 NA 0.635087 NA NA NA NA NA NA
[15,] NA 0.750 0.857143 0.250000 0.742082 0.001000 0.001000 NA 0.636364 NA 0.533333
[16,] NA 0.999 NA 0.250000 NA NA NA NA 0.909091 NA NA
[17,] 0.727273 0.999 0.001000 NA NA NA 0.886366 0.666667 0.909091 0.800000 0.933333
[18,] NA NA 0.571429 NA NA 0.953382 NA 0.833333 0.727273 NA NA
[19,] NA NA NA NA 0.661476 NA NA 0.500000 NA 0.933333 0.600000
[20,] NA NA 0.857143 NA 0.661661 0.459014 0.283793 NA NA NA NA
[21,] NA NA NA NA NA NA NA NA NA NA 0.800000
[22,] 0.454545 NA NA NA NA NA NA 0.333333 0.727273 NA 0.533333
[23,] NA NA NA 0.333333 0.790737 NA NA NA 0.727273 0.433333 NA
[24,] NA 0.875 NA NA NA NA NA NA NA 0.999000 NA
[25,] NA NA 0.571429 0.583333 NA NA 0.196147 0.500000 NA NA NA
[26,] NA 0.999 0.642857 0.250000 NA NA NA NA 0.636364 0.700000 NA
[27,] NA NA 0.714286 NA NA NA NA NA NA NA NA
[28,] NA 0.875 NA 0.500000 NA NA NA NA NA NA 0.666667
[29,] 0.636364 0.750 NA NA NA 0.999000 0.999000 NA NA NA NA
[30,] 0.727273 NA NA NA 0.916098 0.734748 NA NA NA 0.833333 NA
[31,] NA NA NA NA NA NA NA NA NA NA 0.733333
[32,] NA 0.875 NA 0.500000 NA NA NA NA 0.818182 NA NA
[33,] 0.636364 NA NA NA NA NA 0.829819 NA 0.727273 NA 0.733333
[34,] NA NA 0.500000 NA NA NA NA NA NA NA 0.666667
[35,] NA NA 0.214286 NA NA 0.529592 NA 0.001000 0.909091 NA NA
[36,] NA NA NA 0.416667 0.808369 NA NA 0.500000 0.909091 0.633333 0.733333
[37,] NA NA 0.357143 NA NA 0.837555 0.755077 NA 0.818182 NA NA
[38,] NA NA NA 0.166667 0.841643 0.364216 NA NA NA 0.733333 NA
[39,] NA NA 0.500000 0.750000 NA NA NA NA 0.818182 0.999000 0.800000
[40,] NA NA NA NA 0.931836 NA NA NA NA NA 0.133333
[41,] NA NA 0.714286 NA NA 0.848688 NA NA NA NA NA
[42,] NA NA 0.214286 0.333333 0.700812 0.208412 NA 0.333333 NA NA NA
[43,] 0.454545 NA NA NA 0.109326 0.346767 0.877241 0.833333 NA NA NA
[44,] 0.818182 NA 0.857143 NA NA 0.931636 NA NA NA 0.733333 NA
[45,] 0.363636 0.750 NA NA NA NA NA 0.166667 0.818182 NA NA
[46,] NA NA 0.785714 NA 0.738672 NA NA NA NA 0.100000 NA
[47,] 0.181818 NA NA NA NA NA NA NA NA NA 0.001000
[48,] NA NA 0.001000 0.083333 0.308050 0.139592 NA 0.166667 NA NA NA
[49,] NA NA NA NA 0.561841 0.817696 NA 0.666667 NA 0.300000 NA
[50,] NA NA NA 0.416667 NA NA NA NA 0.545455 NA 0.866667
[51,] NA 0.875 NA NA 0.039781 NA NA NA NA 0.933333 NA
[52,] NA NA 0.357143 NA NA NA NA 0.333333 NA NA NA
[53,] NA 0.999 NA NA NA 0.835015 NA NA NA 0.833333 0.666667
[54,] NA 0.750 NA 0.416667 NA NA 0.623528 0.333333 0.818182 NA NA
[55,] NA NA NA 0.666667 NA 0.878312 NA NA NA NA NA
And I apply the following standard mice function
res <- mice(Sparse_Data, maxit = 30, meth = "mean", seed = 500, print = FALSE)
t <- complete(res, action = "long", TRUE) # all the estimations in 10 iterations
out <- split(t , f = t$.imp)[-1]
a <- Reduce("+", out) / length(out)
data_Pred <- a[, 3:ncol(a)]
The predicted matrix I get is:
k1 k3 k5 k6 k7 k8 k11 k12 k13 k14 k15
56 0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.066667
57 0.9090910 0.8676667 0.5373542 0.4429824 0.6069598 0.9447230 NA 0.4583958 0.5454550 0.6959606 NA
58 0.5454550 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.8181820 0.8000000 0.466667
59 0.5454550 0.8676667 0.6428570 0.4429824 0.6069598 0.2609540 NA 0.4583958 0.7561986 0.6959606 NA
60 0.6060607 0.7500000 0.5000000 0.4429824 0.8698450 0.6313629 0.595013 0.4583958 0.7561986 0.6959606 NA
61 0.7272730 0.6250000 0.5373542 0.5833330 0.6069598 0.6313629 NA 0.5000000 0.5454550 0.6959606 NA
62 0.6060607 0.8676667 0.5714290 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.866667
63 0.5454550 0.8676667 0.5373542 0.4429824 0.6069598 0.9055930 0.677757 0.4583958 0.7561986 0.6959606 NA
64 0.6060607 0.9990000 0.7142860 0.7500000 0.6069598 0.6313629 0.881032 0.4583958 0.7561986 0.9333330 0.733333
65 0.6060607 0.7500000 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.5454550 0.6959606 NA
66 0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.8181820 0.6959606 NA
67 0.6060607 0.9990000 0.5373542 0.5833330 0.6069598 0.6313629 0.986145 0.6666670 0.9090910 0.6959606 NA
68 0.8181820 0.8676667 0.8571430 0.5833330 0.0010000 0.6313629 NA 0.4583958 0.7561986 0.1333330 NA
69 0.6060607 0.9990000 0.3571430 0.4429824 0.6350870 0.6313629 NA 0.4583958 0.7561986 0.6959606 NA
70 0.6060607 0.7500000 0.8571430 0.2500000 0.7420820 0.0010000 0.001000 0.4583958 0.6363640 0.6959606 0.533333
71 0.6060607 0.9990000 0.5373542 0.2500000 0.6069598 0.6313629 NA 0.4583958 0.9090910 0.6959606 NA
72 0.7272730 0.9990000 0.0010000 0.4429824 0.6069598 0.6313629 0.886366 0.6666670 0.9090910 0.8000000 0.933333
73 0.6060607 0.8676667 0.5714290 0.4429824 0.6069598 0.9533820 NA 0.8333330 0.7272730 0.6959606 NA
74 0.6060607 0.8676667 0.5373542 0.4429824 0.6614760 0.6313629 NA 0.5000000 0.7561986 0.9333330 0.600000
75 0.6060607 0.8676667 0.8571430 0.4429824 0.6616610 0.4590140 0.283793 0.4583958 0.7561986 0.6959606 NA
76 0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.800000
77 0.4545450 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.3333330 0.7272730 0.6959606 0.533333
78 0.6060607 0.8676667 0.5373542 0.3333330 0.7907370 0.6313629 NA 0.4583958 0.7272730 0.4333330 NA
79 0.6060607 0.8750000 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.9990000 NA
80 0.6060607 0.8676667 0.5714290 0.5833330 0.6069598 0.6313629 0.196147 0.5000000 0.7561986 0.6959606 NA
81 0.6060607 0.9990000 0.6428570 0.2500000 0.6069598 0.6313629 NA 0.4583958 0.6363640 0.7000000 NA
82 0.6060607 0.8676667 0.7142860 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 NA
83 0.6060607 0.8750000 0.5373542 0.5000000 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.666667
84 0.6363640 0.7500000 0.5373542 0.4429824 0.6069598 0.9990000 0.999000 0.4583958 0.7561986 0.6959606 NA
85 0.7272730 0.8676667 0.5373542 0.4429824 0.9160980 0.7347480 NA 0.4583958 0.7561986 0.8333330 NA
86 0.6060607 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.733333
87 0.6060607 0.8750000 0.5373542 0.5000000 0.6069598 0.6313629 NA 0.4583958 0.8181820 0.6959606 NA
88 0.6363640 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 0.829819 0.4583958 0.7272730 0.6959606 0.733333
89 0.6060607 0.8676667 0.5000000 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.666667
90 0.6060607 0.8676667 0.2142860 0.4429824 0.6069598 0.5295920 NA 0.0010000 0.9090910 0.6959606 NA
91 0.6060607 0.8676667 0.5373542 0.4166670 0.8083690 0.6313629 NA 0.5000000 0.9090910 0.6333330 0.733333
92 0.6060607 0.8676667 0.3571430 0.4429824 0.6069598 0.8375550 0.755077 0.4583958 0.8181820 0.6959606 NA
93 0.6060607 0.8676667 0.5373542 0.1666670 0.8416430 0.3642160 NA 0.4583958 0.7561986 0.7333330 NA
94 0.6060607 0.8676667 0.5000000 0.7500000 0.6069598 0.6313629 NA 0.4583958 0.8181820 0.9990000 0.800000
95 0.6060607 0.8676667 0.5373542 0.4429824 0.9318360 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.133333
96 0.6060607 0.8676667 0.7142860 0.4429824 0.6069598 0.8486880 NA 0.4583958 0.7561986 0.6959606 NA
97 0.6060607 0.8676667 0.2142860 0.3333330 0.7008120 0.2084120 NA 0.3333330 0.7561986 0.6959606 NA
98 0.4545450 0.8676667 0.5373542 0.4429824 0.1093260 0.3467670 0.877241 0.8333330 0.7561986 0.6959606 NA
99 0.8181820 0.8676667 0.8571430 0.4429824 0.6069598 0.9316360 NA 0.4583958 0.7561986 0.7333330 NA
100 0.3636360 0.7500000 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.1666670 0.8181820 0.6959606 NA
101 0.6060607 0.8676667 0.7857140 0.4429824 0.7386720 0.6313629 NA 0.4583958 0.7561986 0.1000000 NA
102 0.1818180 0.8676667 0.5373542 0.4429824 0.6069598 0.6313629 NA 0.4583958 0.7561986 0.6959606 0.001000
103 0.6060607 0.8676667 0.0010000 0.0833330 0.3080500 0.1395920 NA 0.1666670 0.7561986 0.6959606 NA
104 0.6060607 0.8676667 0.5373542 0.4429824 0.5618410 0.8176960 NA 0.6666670 0.7561986 0.3000000 NA
105 0.6060607 0.8676667 0.5373542 0.4166670 0.6069598 0.6313629 NA 0.4583958 0.5454550 0.6959606 0.866667
106 0.6060607 0.8750000 0.5373542 0.4429824 0.0397810 0.6313629 NA 0.4583958 0.7561986 0.9333330 NA
107 0.6060607 0.8676667 0.3571430 0.4429824 0.6069598 0.6313629 NA 0.3333330 0.7561986 0.6959606 NA
108 0.6060607 0.9990000 0.5373542 0.4429824 0.6069598 0.8350150 NA 0.4583958 0.7561986 0.8333330 0.666667
109 0.6060607 0.7500000 0.5373542 0.4166670 0.6069598 0.6313629 0.623528 0.3333330 0.8181820 0.6959606 NA
110 0.6060607 0.8676667 0.5373542 0.6666670 0.6069598 0.8783120 NA 0.4583958 0.7561986 0.6959606 NA
Maybe someone can shed some light on the problem?
The MICE algorithm can impute mixes of continuous, binary, unordered categorical and ordered categorical data. In addition, MICE can impute continuous two-level data, and maintain consistency between imputations by means of passive imputation.
MICE is a multiple imputation method used to replace missing data values in a data set under certain assumptions about the data missingness mechanism (e.g., the data are missing at random, the data are missing completely at random).
The logged events form a structured report that identify problems with the data, and details which corrective actions were taken by mice() . It is a component called loggedEvents of the mids object.
MICE assumes that the missing data are Missing at Random (MAR), which means that the probability that a value is missing depends only on observed value and can be predicted using them. It imputes data on a variable by variable basis by specifying an imputation model per variable.
Answer
You have perfectly collinear columns in your dataset. Particularly:
k11
and k14
k8
and k15
The default behavior of mice
is to remove perfectly collinear columns.
Solutions
mice:::find.collinear(Sparse_Data)
)mice(pred = my_prediction_matrix)
).Details
mice
relies on its PredictionMatrix
. This is a matrix that is used to determine from which columns the missing values of each variable are predicted. If a column is empty, then that variable will not be predicted, regardless of what method you specify.
You can check this matrix by running mice
and then typing res$pred
. As you can see, the columns for k11
and k15
are empty and therefore they aren't imputed.
So why does mice
make those two columns empty? Well, mice
calls the check.data
function, which in turn calls find.collinear
. This function will specify which variables are collinear, and mice
removes these columns in subsequent steps.
Are any of your columns collinear? Well, yes:
cor(Sparse_Data, use = "pairwise.complete.obs")
k1 k3 k5 k6 k7 k8 k11 k12 k13 k14 k15
k1 1.0000000 1.740412e-01 0.24932705 NA 0.17164319 0.640984131 0.3053596 0.4225772 -0.536055739 -0.50460872 0.97321365
k3 0.1740412 1.000000e+00 -0.42409199 -9.370804e-05 -0.38583663 0.361416106 0.5515156 0.6567106 0.634250161 -0.70631658 0.74001342
k5 0.2493271 -4.240920e-01 1.00000000 4.471829e-01 0.02679894 0.234850334 -0.6624768 0.4201946 -0.924517670 -0.45408744 -0.78628746
k6 NA -9.370804e-05 0.44718290 1.000000e+00 -0.35377747 0.818644775 0.6824749 0.8899878 0.147657537 0.27030472 0.49159991
k7 0.1716432 -3.858366e-01 0.02679894 -3.537775e-01 1.00000000 0.207791538 -0.6406942 -0.2863018 0.898687181 0.14987951 -0.70210859
k8 0.6409841 3.614161e-01 0.23485033 8.186448e-01 0.20779154 1.000000000 0.7491736 0.5219197 0.002468839 -0.13067177 1.00000000
k11 0.3053596 5.515156e-01 -0.66247684 6.824749e-01 -0.64069422 0.749173578 1.0000000 0.5925582 0.830372468 -1.00000000 0.83452358
k12 0.4225772 6.567106e-01 0.42019459 8.899878e-01 -0.28630180 0.521919747 0.5925582 1.0000000 -0.134937885 -0.49251775 0.92582043
k13 -0.5360557 6.342502e-01 -0.92451767 1.476575e-01 0.89868718 0.002468839 0.8303725 -0.1349379 1.000000000 0.29508347 0.13853862
k14 -0.5046087 -7.063166e-01 -0.45408744 2.703047e-01 0.14987951 -0.130671767 -1.0000000 -0.4925177 0.295083470 1.00000000 0.02558161
k15 0.9732137 7.400134e-01 -0.78628746 4.915999e-01 -0.70210859 1.000000000 0.8345236 0.9258204 0.138538625 0.02558161 1.00000000
As you can see, k11
is perfectly correlated with k14
, and k15
with k8
. This is why they get kicked out. As expected:
mice:::find.collinear(Sparse_Data)
# [1] "k11" "k15"
Demonstration #1 (NOT a solution)
Try specifying mice(pred = diag(ncol(Sparse_Data)), ...)
. You'll see that now it works. [Edit: For future readers: this is not a way to SOLVE the problem, just to show where the problem is.]
Demonstration #2 (NOT a solution)
Try running this code before your code and you'll see that it indeed works:
Sparse_Data$k11[1] <- 2
Sparse_Data$k15[1] <- 2
Sparse_Data$k8[1] <- 0.5
Sparse_Data$k14[1] <- 0.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With