I am trying to use the Python statsmodels linear mixed effects model to fit a model that has two random intercepts, e.g. two groups. I cannot figure out how to initialize the model so that I can do this.
Here's the example. I have data that looks like the following (taken from here):
subject gender scenario attitude frequency
F1 F 1 pol 213.3
F1 F 1 inf 204.5
F1 F 2 pol 285.1
F1 F 2 inf 259.7
F1 F 3 pol 203.9
F1 F 3 inf 286.9
F1 F 4 pol 250.8
F1 F 4 inf 276.8
I want to make a linear mixed effects model with two random effects -- one for the subject group and one for the scenario group. I am trying to do this:
import statsmodels.api as sm
model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data[['subject', 'scenario']])
result = model.fit()
print result.summary()
I keep getting this error:
LinAlgError: Singular matrix
It works fine in R. When I use lme4
in R with the formula-based rendering it fits just fine:
politeness.model = lmer(frequency ~ attitude + gender +
(1|subject) + (1|scenario), data=politeness)
I don't understand why this is happening. It works when I use any one random effect/group, e.g.
model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data['subject'])
Then I get:
Mixed Linear Model Regression Results
===============================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 6 Scale: 850.9456
Min. group size: 13 Likelihood: -393.3720
Max. group size: 14 Converged: Yes
Mean group size: 13.8
---------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
---------------------------------------------------------------
Intercept 256.785 15.226 16.864 0.000 226.942 286.629
attitude[T.pol] -19.415 6.407 -3.030 0.002 -31.972 -6.858
gender[T.M] -108.325 21.064 -5.143 0.000 -149.610 -67.041
Intercept RE 603.948 23.995
===============================================================
Alternatively, if I do:
model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data['scenario'])
This is the result I get:
Mixed Linear Model Regression Results
================================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 7 Scale: 1110.3788
Min. group size: 11 Likelihood: -402.5003
Max. group size: 12 Converged: Yes
Mean group size: 11.9
----------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
----------------------------------------------------------------
Intercept 256.892 8.120 31.637 0.000 240.977 272.807
attitude[T.pol] -19.807 7.319 -2.706 0.007 -34.153 -5.462
gender[T.M] -108.603 7.319 -14.838 0.000 -122.948 -94.257
Intercept RE 182.718 5.502
================================================================
I have no idea what's going on. I feel like I am missing something foundational in the statistics of the problem.
You are trying to fit a model with crossed random effects, i.e., you want to allow for consistent variation among subjects across scenarios as well as consistent variation among scenarios across subjects. You can use multiple random-effects terms in statsmodels, but they must be nested. Fitting crossed (as opposed to nested) random effects requires more sophisticated algorithms, and indeed the statsmodels documentation says (as of 25 Aug 2016, emphasis added):
Some limitations of the current implementation are that it does not support structure more complex on the residual errors (they are always homoscedastic), and it does not support crossed random effects. We hope to implement these features for the next release.
As far as I can see, your choices are (1) fall back to a nested model (i.e. fit the model as though either scenario is nested within subject or vice versa - or try both and see if the difference matters); (2) fall back to lme4
, either within R or via rpy2.
As always, you're entitled to a full refund of the money you paid to use statsmodels ...
Multiple or crossed random intercepts crossed effects can be fit using variance components, which are implemented in a different way from the one-group mixed effects.
I don't find an example, and the documentation seems to be only partially updated.
The unit tests contain an example using the MixedLM formula interface:
https://github.com/statsmodels/statsmodels/blob/master/statsmodels/regression/tests/test_lme.py#L284
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With