Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

glmer from R package lme4 asking to scale variables even though variables already scaled

Tags:

r

scaling

glm

lme4

I have a dataset with 27 variables and ~30,000 observations. The first 17 variables are continuous and the remainder are binary. When running glmer with the model specified as all the fixed effects + random effect intercept based on a subject ID, I keep getting a warning message that:

In checkConv(attr(opt, "derivs"), opt$par, ctrl=control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?

All the continuous variables were scaled using the "scale" function with center and scale set to TRUE. So I don't understand why I keep getting this message. Some of the variables are a little bit skewed, could that be leading to the warning?

like image 317
notaclue1980 Avatar asked Dec 31 '15 19:12

notaclue1980


1 Answers

tl;dr if in doubt try with a different optimizer and make sure your results are stable, but I would probably be willing to ignore this warning, especially because your data set is large (>10,000 obs).

lme4 is reporting that some of the eigenvalues of the estimated Hessian (second derivative matrix) of the parameter estimates are large (>500); this suggests there might be numeric instability, which can sometimes be resolved (if you haven't done it already) by scaling & centering the parameters.

However, I'm guessing that this is due to a bad estimation of the Hessian, which leads to a misleading estimate of the eigenvalues. This is a bit of a dirty secret of lme4 - ever since we introduced convergence tests a few releases ago, we've been trying to get them right (which is hard). In particular, we use a naive finite-difference approximation of the Hessian which works poorly for large (>10,000 obs) data sets ... here's an example from a simulation study (results in full here) - blue points are minimum eigenvalues of the Hessian estimated via Richardson extrapolation (numDeriv::hessian), pink points are min eigenvalues using our naive finite-difference rules. Panels are with different optimizers; top row is unconstrained, bottom row is clamped to the range (0.5,5) ...

enter image description here

like image 79
Ben Bolker Avatar answered Oct 23 '22 03:10

Ben Bolker