Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Big data: generalized linear mixed-effects models

I'm looking for suggestions for a strategy of fitting generalized linear mixed-effects models for a relative large data-set.

Consider I have data on 8 milllion US basketball passes on about 300 teams in 10 years. The data looks something like this:

data <- data.frame(count = c(1,1,2,1,1,5),
               length_pass= c(1,2,5,7,1,3),
               year= c(1,1,1,2,2,2),
               mean_length_pass_team= c(15,15,9,14,14,8),
               team= c('A', 'A', 'B', 'A', 'A', 'B'))
data
 count length_pass year mean_length_pass_team team
1     1           1    1                    15    A
2     1           2    1                    15    A
3     2           5    1                     9    B
4     1           7    2                    14    A
5     1           1    2                    14    A
6     5           3    2                     8    B

I'm want to explain the count of steps a player takes before passing the ball. I have theoretical motivations to assume there are team-level differences between count and length_pass, so a multi-level (i.e. mixed effects) model seems appropriate.

My individual level control variables are length_pass and year.

On the team-level I have mean_length_pass_team. This should help me to avoid ecological fallacies, according to Snijders, 2011.

I have been using the lme4 and brms packages to estimate these models but it takes days/weeks to fit these models on my local 12-core 128GB machine.

library(lme4)
model_a <- glmer(count ~ length_pass + year + mean_length_pass_team + (1 | team),
                 data=data,
                 family= "poisson",
                 control=glmerControl(optCtrl=list(maxfun=2e8))) 

library(brms)
options (mc.cores=parallel::detectCores ())
model_b <- brm(count ~ length_pass + year + mean_length_pass_team + (1 | team),
                 data=data,
                 family= "poisson")

I am looking for suggestions to speed up the fitting process or new techniques to fit a generalized linear mixed-effects model:

  • (How) Can I improve the speed on the lme4 and brms fits?
  • Are there other packages to consider?
  • Are there step-wise procedures that can help increase the speed of fitting models?
  • Are there interesting options outside the R environment that can help me fit this?

Any pointers are much appreciated!

like image 993
wake_wake Avatar asked Nov 07 '22 15:11

wake_wake


1 Answers

I have found the package MCMCglmm to be much faster than brms for models that MCMCglmm can fit (I've sometimes found brms fits models I can't fit with MCMCglmm).

You may need to toy around with the syntax, but it would be something like this:

    MCMCglmm(data = data, family = "poisson",
             fixed = count ~ year, 
             random = ~ team)

A downside is that I have found it difficult in the past to find many online code examples that are connected to an explicit mathematical formulation of the models--it can be difficult to judge whether you are fitting the model you intent to fit. However, your model seems simple enough.

like image 141
Lacey Etzkorn Avatar answered Nov 14 '22 22:11

Lacey Etzkorn