Speed up lmer function in R

Tags:

I would like to share some of my thoughts when trying to improve the model fitting time of a linear mixed effects model in R using the lme4 package.

Dataset Size: The dataset consists, approximately, of 400.000 rows and 32 columns. Unfortunately, no information can be shared about the nature of the data.

Assumptions and Checks: It is assumed that the response variable comes from a Normal distribution. Prior to the model fitting process, variables were tested for collinearity and multicollinearity using correlation tables and the alias function provided in R.

Continuous variables were scaled in order to help convergence.

Model Structure: The model equation contains 31 fixed effects (including intercept) and 30 random effects (intercept is not included). Random effects are randomized for a specific factor variable that has 2700 levels. The covariance structure is Variance Components as it is assumed that there is independency between random effects.

Model equation example:

lmer(Response ~ 1 + Var1 + Var2 + ... + Var30 + (Var1-1| Group) + (Var2-1| Group) + ... + (Var30-1| Group), data=data, REML=TRUE)

Model was fitted successfully, however, it took about 3,1 hours to provide results. The same model in SAS took a few seconds. There is available literature on the web on how to reduce time by using the non-linear optimization algorithm nloptwrap and turnining off the time consuming derivative calculation that is performed after the optmization is finished calc.derivs = FALSE:

https://cran.r-project.org/web/packages/lme4/vignettes/lmerperf.html

Time was reduced by 78%.

Question: Is there any other alternative way to reduce the model fitting time by defining the lmer parameter inputs accordingly? There is so much difference between R and SAS in terms of model fitting time.

Any suggestion is appreciated.

413

asked Aug 24 '15 08:08

mammask

2 Answers

lmer() determines the parameter estimates by optimizing the profiled log-likehood or profiled REML criterion with respect to the parameters in the covariance matrix of the random effects. In your example there will be 31 such parameters, corresponding to the standard deviations of the random effects from each of the 31 terms. Constrained optimizations of that size take time.

It is possible that SAS PROC MIXED has specific optimization methods or has more sophisticated ways of determining starting estimates. SAS being a closed-source system means we won't know what they do.

By the way, you can write the random effects as (1+Var1+Var2+...+Var30||Group)

answered Oct 18 '22 18:10

Douglas Bates

We have implemented random intercepts regression assuming compound symmetry in the R package Rfast. The command is rint.reg. It is 30+ times faster than the corresponding lme4 function. I do not know if this helps, but just in case.

https://cran.r-project.org/web/packages/Rfast/index.html

answered Oct 18 '22 18:10

Michail

Related questions
                            
                                OAuth access for R
                            
                                Usage of Dot / Period in R Functions
                            
                                Gdata package perl issue
                            
                                R's data.table Truncating Bits?
                            
                                How do I close unused connections after read_html in R
                            
                                Fastest way to parse a date-time string to class Date
                            
                                Building a package with devtools - throwing an error where "Author" and "Maintainer" fields are missing/empty despite being filled
                            
                                Using "..." and "replicate"
                            
                                An error ['\+' is an unrecognized escape in character string starting "\+" while creating a R package
                            
                                How to write a function that calls a function that calls data.table?
                            
                                specify dplyr column names [duplicate]
                            
                                Why does facet_grid work, but not facet_wrap?
                            
                                Why is a length one vector initially at NAM(2)?
                            
                                Where to put external files for testthat tests
                            
                                How achieve identical facet sizes and scales in several multi-facet ggplot2 graphics?
                            
                                Parse Error: "Trailing Garbage" while trying to parse JSON column in data frame
                            
                                dplyr rename - Error: `new_name` = old_name must be a symbol or a string, not formula
                            
                                ggplot2: raster plotting does not work as expected when setting alpha values
                            
                                Want only the time portion of a date-time object in R
                            
                                Passing missing argument from function to function in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Speed up lmer function in R

Tags:

performance

r

lme4

mixed-models

mammask

People also ask

2 Answers

Douglas Bates

Michail

Recent Activity

Donate For Us