Prediction with lme4 on new levels

Question

I'm trying to fit a mixed effects model and then use that model to generate estimates on a new dataset that may have different levels. I expected that the estimates on a new dataset would use the mean value of the estimated parameters, but that doesn't seem to be the case. Here's a minimum working example:

library(lme4)
d = data.frame(x = rep(1:10, times = 3),
               y = NA,
               grp = rep(1:3, each = 10))
d$y[d$grp == 1] = 1:10 + rnorm(10)
d$y[d$grp == 2] = 1:10 * 1.5 + rnorm(10)
d$y[d$grp == 3] = 1:10 * 0.5 + rnorm(10)
fit = lmer(y ~ (1+x)|grp, data = d)
newdata = data.frame(x = 1:10, grp = 4)
predict(fit, newdata = newdata, allow.new.levels = TRUE)

In this example, I'm essentially defining three groups with different regression equations (slopes of 1, 1.5 and 0.5). However, when I try to predict on a new dataset with an unseen level, I get a constant estimate. I would have expected the expected value of the slope and intercept to be used to generate predictions for this new data. Am I expecting the wrong thing? Or, what am I doing wrong with my code?

Roland · Accepted Answer

I generally wouldn't include a random slope without including a fixed slope. It seems like predict.merMod agrees with me, because it seems to simply use only the fixed effects to predict for new levels. The documentation says "the prediction will use the unconditional (population-level) values for data with previously unobserved levels", but these values don't seem to be estimated with your model specification.

Thus, I suggest this model:

fit = lmer(y ~ x + (x|grp), data = d)
newdata = data.frame(x = 1:10, grp = 4)
predict(fit, newdata = newdata, allow.new.levels = TRUE)
#       1         2         3         4         5         6         7         8         9        10 
#1.210219  2.200685  3.191150  4.181616  5.172082  6.162547  7.153013  8.143479  9.133945 10.124410

This is the same as only using the fixed effects part of the model:

t(cbind(1, newdata$x) %*% fixef(fit))
#         [,1]     [,2]    [,3]     [,4]     [,5]     [,6]     [,7]     [,8]     [,9]    [,10]
#[1,] 1.210219 2.200685 3.19115 4.181616 5.172082 6.162547 7.153013 8.143479 9.133945 10.12441

Ben Bolker · Answer

Maybe it's not clear enough, but I think the documentation for ?predict.merMod states (reasonably) clearly what happens when allow.new.levels=TRUE. I guess the ambiguity might be in what "unconditional (population-level) values" means ...

allow.new.levels: logical if new levels (or NA values) in ‘newdata’ are allowed. If FALSE (default), such new values in ‘newdata’ will trigger an error; if TRUE, then the prediction will use the unconditional (population-level) values for data with previously unobserved levels (or NAs).

Prediction with lme4 on new levels

Tags:

r

prediction

lme4

mixed-models

random-effects

random_forest_fanatic

2 Answers

Roland

Ben Bolker

Recent Activity

Donate For Us

Prediction with lme4 on new levels

Tags:

r

prediction

lme4

mixed-models

random-effects

random_forest_fanatic

2 Answers

Roland

Ben Bolker

Related questions

Recent Activity

Donate For Us