Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prediction with lme4 on new levels

I'm trying to fit a mixed effects model and then use that model to generate estimates on a new dataset that may have different levels. I expected that the estimates on a new dataset would use the mean value of the estimated parameters, but that doesn't seem to be the case. Here's a minimum working example:

library(lme4)
d = data.frame(x = rep(1:10, times = 3),
               y = NA,
               grp = rep(1:3, each = 10))
d$y[d$grp == 1] = 1:10 + rnorm(10)
d$y[d$grp == 2] = 1:10 * 1.5 + rnorm(10)
d$y[d$grp == 3] = 1:10 * 0.5 + rnorm(10)
fit = lmer(y ~ (1+x)|grp, data = d)
newdata = data.frame(x = 1:10, grp = 4)
predict(fit, newdata = newdata, allow.new.levels = TRUE)

In this example, I'm essentially defining three groups with different regression equations (slopes of 1, 1.5 and 0.5). However, when I try to predict on a new dataset with an unseen level, I get a constant estimate. I would have expected the expected value of the slope and intercept to be used to generate predictions for this new data. Am I expecting the wrong thing? Or, what am I doing wrong with my code?

like image 929
random_forest_fanatic Avatar asked Mar 25 '15 15:03

random_forest_fanatic


2 Answers

I generally wouldn't include a random slope without including a fixed slope. It seems like predict.merMod agrees with me, because it seems to simply use only the fixed effects to predict for new levels. The documentation says "the prediction will use the unconditional (population-level) values for data with previously unobserved levels", but these values don't seem to be estimated with your model specification.

Thus, I suggest this model:

fit = lmer(y ~ x + (x|grp), data = d)
newdata = data.frame(x = 1:10, grp = 4)
predict(fit, newdata = newdata, allow.new.levels = TRUE)
#       1         2         3         4         5         6         7         8         9        10 
#1.210219  2.200685  3.191150  4.181616  5.172082  6.162547  7.153013  8.143479  9.133945 10.124410

This is the same as only using the fixed effects part of the model:

t(cbind(1, newdata$x) %*% fixef(fit))
#         [,1]     [,2]    [,3]     [,4]     [,5]     [,6]     [,7]     [,8]     [,9]    [,10]
#[1,] 1.210219 2.200685 3.19115 4.181616 5.172082 6.162547 7.153013 8.143479 9.133945 10.12441
like image 106
Roland Avatar answered Oct 23 '22 01:10

Roland


Maybe it's not clear enough, but I think the documentation for ?predict.merMod states (reasonably) clearly what happens when allow.new.levels=TRUE. I guess the ambiguity might be in what "unconditional (population-level) values" means ...

allow.new.levels: logical if new levels (or NA values) in ‘newdata’ are allowed. If FALSE (default), such new values in ‘newdata’ will trigger an error; if TRUE, then the prediction will use the unconditional (population-level) values for data with previously unobserved levels (or NAs).

like image 21
Ben Bolker Avatar answered Oct 23 '22 02:10

Ben Bolker