I'm attempting to implement mixed effects logistic regression in python. As a point of comparison, I'm using the glmer
function from the lme4
package in R.
I've found that the statsmodels
module has a BinomialBayesMixedGLM
that should be able to fit such a model. However, I've encountered a number of issues:
statsmodels
function to be not entirely helpful or clear, so I'm not completely sure how to use the function appropriately.glmer
in R.BinomialBayesMixedGLM
function does not calculate p-values since it is Bayesian, but I can't seem to figure out how to access the full posterior distributions for the parameters.As a test case, I'm using the titanic dataset available here.
import os
import pandas as pd
import statsmodels.genmod.bayes_mixed_glm as smgb
titanic = pd.read_csv(os.path.join(os.getcwd(), 'titanic.csv'))
r = {"Pclass": '0 + Pclass'}
mod = smgb.BinomialBayesMixedGLM.from_formula('Survived ~ Age', r, titanic)
fit = mod.fit_map()
fit.summary()
# Type Post. Mean Post. SD SD SD (LB) SD (UB)
# Intercept M 3.1623 0.3616
# Age M -0.0380 0.0061
# Pclass V 0.0754 0.5669 1.078 0.347 3.351
However, aside from the slope for Age, this doesn't appear to match what I get in R with glmer(Survived ~ Age + (1 | Pclass), data = titanic, family = "binomial")
:
Random effects:
Groups Name Variance Std.Dev.
Pclass (Intercept) 0.8563 0.9254
Number of obs: 887, groups: Pclass, 3
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.961780 0.573402 1.677 0.0935 .
Age -0.038708 0.006243 -6.200 5.65e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
So what error am I making when creating my model in python? And, once that's addressed, how can I extract either the posteriors or p-values? Lastly, are their any python implementations of mixed effect logistic regressions that are more akin to the implementation in R?
A Mixed Effects Model is a statistical test used to predict a single variable using two or more other variables. It also is used to determine the numerical relationship between one variable and others. The variable you want to predict should be continuous and your data should meet the other assumptions listed below.
Linear mixed models are an extension of simple linear models to allow both fixed and random effects, and are particularly used when there is non independence in the data, such as arises from a hierarchical structure.
Logistic regression with random effects is used to study the relationship between explanatory variables and a binary outcome in cases with nonindependent outcomes. In this paper, we examine in detail the interpretation of both fixed effects and random effects parameters.
Mixed Effects Logistic Regression is sometimes also called Repeated Measures Logistic Regression, Multilevel Logistic Regression and Multilevel Binary Logistic Regression .
Just had to do something similar with Python
, as suggested in the comments Pymer4
appeared to offer a suitable approach (especially if you are familiar with R
anyway).
Using the example dataset 'titanic' referred to in the question:
from pymer4.models import Lmer
model = Lmer("Survived ~ Age + (1|Pclass)",
data=titanic, family = 'binomial')
print(model.fit())
OUT:
Formula: Survived~Age+(1|Pclass)
Family: binomial Inference: parametric
Number of observations: 887 Groups: {'Pclass': 3.0}
Log-likelihood: -525.812 AIC: 1057.624
Random effects:
Name Var Std
Pclass (Intercept) 0.856 0.925
No random effect correlations specified
Fixed effects:
Estimate 2.5_ci 97.5_ci SE OR OR_2.5_ci OR_97.5_ci \
(Intercept) 0.962 -0.162 2.086 0.573 2.616 0.85 8.050
Age -0.039 -0.051 -0.026 0.006 0.962 0.95 0.974
Prob Prob_2.5_ci Prob_97.5_ci Z-stat P-val Sig
(Intercept) 0.723 0.460 0.889 1.677 0.093 .
Age 0.490 0.487 0.493 -6.200 0.000 ***
Just as an additional comment (sorry for diverting from the main question), I ran this on an Ubuntu 20.04
machine with Python 3.8.8
. Not sure if anyone else encountered this problem, but when running the model above with Pymer4
the package threw an error (Same error when I tried to reproduce a similar model form the Pymer4
documentation here):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
referring to this line in the /pymer4/models/Lmer.py package file:
--> 444 if design_matrix:
I fixed this by changing this to (not sure if this is the most elegant or safest approach, happy to be corrected here):
if design_matrix.any():
This appeared to get the package running and provide R-equivalent results in the few cases that I tested.
A better approach suggested in the comments below may be:
if design_matrix is not None:
thanks to Giacomo Petrillo for pointing that out, I have not tested this however.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With