Lately I have been trying to fit a lot of random effects models to relatively big datasets. Let’s say about 50,000 people (or more) observed at up to 25 time points. With such a large sample size, we include a lot of predictors that we’re adjusting for – maybe 50 or so fixed effects. I’m fitting the model to a binary outcome using lme4::glmer in R, with random intercepts for each subject. I can't go into specifics on the data, but the basic format of the glmer command I used was:
fit <-  glmer(outcome ~ treatment + study_quarter + dd_quarter + (1|id),
              family = "binomial", data = dat)
where both study_quarter and dd_quarter are factors with about 20 levels each.
When I try to fit this model in R, it runs for about 12-15 hours, and returns an error that it failed to converge. I did a bunch of troubleshooting (e.g., following these guidelines), with no improvement. And the convergence isn’t even close in the end (max gradient around 5-10, whereas the convergence criterion is 0.001 I think).
I then tried fitting the model in Stata, using the melogit command. The model fit in under 2 mins, with no convergence issues. The corresponding Stata command is
melogit outcome treatment i.study_quarter i.dd_quarter || id:
What gives? Does Stata just have a better fitting algorithm, or one better optimized for large models and large datasets? It’s really surprising how different the run times were.
The glmer fit will probably be much faster with the optional argument nAGQ=0L.  You have many fixed-effects parameters (20 levels for each of study_quarter and dd_quarter generate a total of 28 contrasts) and the default optimization method (corresponding to nAGQ=1L) puts all of those coefficients into the general nonlinear optimization call.  With nAGQ=0L these coefficients are optimized within the much faster penalized iteratively reweighted least squares (PIRLS) algorithm.  The default will generally provide a better estimate in the sense that the deviance at the estimate is lower, but the difference is usually very small and the time difference is enormous.
I have a write-up of the differences in these algorithms as a Jupyter notebook nAGQ.ipynb.  That writeup uses the MixedModels package for Julia instead of lme4 but the methods are similar.  (I am one of the authors of lme4 and the author of MixedModels.)
If you are going to be doing a lot of GLMM fits I would consider doing so in Julia with MixedModels.  It is often much faster than R, even with all the complicated code in lme4.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With