Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to have multiple groups in Python statsmodels linear mixed effects model?

I am trying to use the Python statsmodels linear mixed effects model to fit a model that has two random intercepts, e.g. two groups. I cannot figure out how to initialize the model so that I can do this.

Here's the example. I have data that looks like the following (taken from here):

subject gender  scenario    attitude    frequency
F1  F   1   pol 213.3
F1  F   1   inf 204.5
F1  F   2   pol 285.1
F1  F   2   inf 259.7
F1  F   3   pol 203.9
F1  F   3   inf 286.9
F1  F   4   pol 250.8
F1  F   4   inf 276.8

I want to make a linear mixed effects model with two random effects -- one for the subject group and one for the scenario group. I am trying to do this:

import statsmodels.api as sm
model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data[['subject', 'scenario']])
result = model.fit()
print result.summary()

I keep getting this error:

LinAlgError: Singular matrix

It works fine in R. When I use lme4 in R with the formula-based rendering it fits just fine:

politeness.model = lmer(frequency ~ attitude + gender + 
        (1|subject)  + (1|scenario), data=politeness)

I don't understand why this is happening. It works when I use any one random effect/group, e.g.

model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data['subject'])

Then I get:

                 Mixed Linear Model Regression Results
===============================================================
Model:                MixedLM   Dependent Variable:   frequency
No. Observations:     83        Method:               REML     
No. Groups:           6         Scale:                850.9456 
Min. group size:      13        Likelihood:           -393.3720
Max. group size:      14        Converged:            Yes      
Mean group size:      13.8                                     
---------------------------------------------------------------
                 Coef.   Std.Err.   z    P>|z|  [0.025   0.975]
---------------------------------------------------------------
Intercept        256.785   15.226 16.864 0.000  226.942 286.629
attitude[T.pol]  -19.415    6.407 -3.030 0.002  -31.972  -6.858
gender[T.M]     -108.325   21.064 -5.143 0.000 -149.610 -67.041
Intercept RE     603.948   23.995                              
===============================================================

Alternatively, if I do:

model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data['scenario'])

This is the result I get:

              Mixed Linear Model Regression Results
================================================================
Model:               MixedLM    Dependent Variable:    frequency
No. Observations:    83         Method:                REML     
No. Groups:          7          Scale:                 1110.3788
Min. group size:     11         Likelihood:            -402.5003
Max. group size:     12         Converged:             Yes      
Mean group size:     11.9                                       
----------------------------------------------------------------
                 Coef.   Std.Err.    z    P>|z|  [0.025   0.975]
----------------------------------------------------------------
Intercept        256.892    8.120  31.637 0.000  240.977 272.807
attitude[T.pol]  -19.807    7.319  -2.706 0.007  -34.153  -5.462
gender[T.M]     -108.603    7.319 -14.838 0.000 -122.948 -94.257
Intercept RE     182.718    5.502                               
================================================================

I have no idea what's going on. I feel like I am missing something foundational in the statistics of the problem.

like image 227
Sourav Dey Avatar asked Aug 25 '16 18:08

Sourav Dey


2 Answers

You are trying to fit a model with crossed random effects, i.e., you want to allow for consistent variation among subjects across scenarios as well as consistent variation among scenarios across subjects. You can use multiple random-effects terms in statsmodels, but they must be nested. Fitting crossed (as opposed to nested) random effects requires more sophisticated algorithms, and indeed the statsmodels documentation says (as of 25 Aug 2016, emphasis added):

Some limitations of the current implementation are that it does not support structure more complex on the residual errors (they are always homoscedastic), and it does not support crossed random effects. We hope to implement these features for the next release.

As far as I can see, your choices are (1) fall back to a nested model (i.e. fit the model as though either scenario is nested within subject or vice versa - or try both and see if the difference matters); (2) fall back to lme4, either within R or via rpy2.

As always, you're entitled to a full refund of the money you paid to use statsmodels ...

like image 168
Ben Bolker Avatar answered Oct 13 '22 08:10

Ben Bolker


Multiple or crossed random intercepts crossed effects can be fit using variance components, which are implemented in a different way from the one-group mixed effects.

I don't find an example, and the documentation seems to be only partially updated.

The unit tests contain an example using the MixedLM formula interface:

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/regression/tests/test_lme.py#L284

like image 37
Josef Avatar answered Oct 13 '22 07:10

Josef