Mixed effects logistic regression

Tags:

I'm attempting to implement mixed effects logistic regression in python. As a point of comparison, I'm using the glmer function from the lme4 package in R.

I've found that the statsmodels module has a BinomialBayesMixedGLM that should be able to fit such a model. However, I've encountered a number of issues:

I find the documentation for the statsmodels function to be not entirely helpful or clear, so I'm not completely sure how to use the function appropriately.
So far, my attempts have not produced results that replicate what I get when fitting the model with glmer in R.
I expect the BinomialBayesMixedGLM function does not calculate p-values since it is Bayesian, but I can't seem to figure out how to access the full posterior distributions for the parameters.

As a test case, I'm using the titanic dataset available here.

import os
import pandas as pd
import statsmodels.genmod.bayes_mixed_glm as smgb

titanic = pd.read_csv(os.path.join(os.getcwd(), 'titanic.csv'))

r = {"Pclass": '0 + Pclass'}
mod = smgb.BinomialBayesMixedGLM.from_formula('Survived ~ Age', r, titanic)
fit = mod.fit_map()
fit.summary()

#           Type    Post. Mean  Post. SD       SD SD (LB) SD (UB)
# Intercept M           3.1623    0.3616            
# Age       M          -0.0380    0.0061            
# Pclass    V           0.0754    0.5669    1.078   0.347   3.351

However, aside from the slope for Age, this doesn't appear to match what I get in R with glmer(Survived ~ Age + (1 | Pclass), data = titanic, family = "binomial"):

Random effects:
 Groups Name        Variance Std.Dev.
 Pclass (Intercept) 0.8563   0.9254  
Number of obs: 887, groups:  Pclass, 3

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.961780   0.573402   1.677   0.0935 .  
Age         -0.038708   0.006243  -6.200 5.65e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So what error am I making when creating my model in python? And, once that's addressed, how can I extract either the posteriors or p-values? Lastly, are their any python implementations of mixed effect logistic regressions that are more akin to the implementation in R?

915

asked Feb 25 '20 21:02

YTD

1 Answers

Just had to do something similar with Python, as suggested in the comments Pymer4 appeared to offer a suitable approach (especially if you are familiar with R anyway). Using the example dataset 'titanic' referred to in the question:

from pymer4.models import Lmer

model = Lmer("Survived  ~ Age  + (1|Pclass)",
             data=titanic, family = 'binomial')

print(model.fit())

OUT:

Formula: Survived~Age+(1|Pclass)

Family: binomial     Inference: parametric

Number of observations: 887  Groups: {'Pclass': 3.0}

Log-likelihood: -525.812     AIC: 1057.624

Random effects:

               Name    Var    Std
Pclass  (Intercept)  0.856  0.925

No random effect correlations specified

Fixed effects:

             Estimate  2.5_ci  97.5_ci     SE     OR  OR_2.5_ci  OR_97.5_ci  \
(Intercept)     0.962  -0.162    2.086  0.573  2.616       0.85       8.050   
Age            -0.039  -0.051   -0.026  0.006  0.962       0.95       0.974   

              Prob  Prob_2.5_ci  Prob_97.5_ci  Z-stat  P-val  Sig  
(Intercept)  0.723        0.460         0.889   1.677  0.093    .  
Age          0.490        0.487         0.493  -6.200  0.000  ***

Just as an additional comment (sorry for diverting from the main question), I ran this on an Ubuntu 20.04 machine with Python 3.8.8. Not sure if anyone else encountered this problem, but when running the model above with Pymer4 the package threw an error (Same error when I tried to reproduce a similar model form the Pymer4 documentation here):

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

referring to this line in the /pymer4/models/Lmer.py package file:

--> 444         if design_matrix:

I fixed this by changing this to (not sure if this is the most elegant or safest approach, happy to be corrected here):

if design_matrix.any():

This appeared to get the package running and provide R-equivalent results in the few cases that I tested.

A better approach suggested in the comments below may be:

if design_matrix is not None:

thanks to Giacomo Petrillo for pointing that out, I have not tested this however.

135

answered Oct 03 '22 05:10

Hal Koons

Related questions
                            
                                Tensorflow installation using SSE instructions with pip
                            
                                Show image without scaling
                            
                                How to include a shared C library in a Python package
                            
                                Structuring python projects without path hacks
                            
                                Annotate with django-graphene and filters
                            
                                Python subprocess in .exe
                            
                                Creating a dylib file on MacOS for use with Python wrapper of Steamworks API
                            
                                Spark 2.3 Memory Leak on Executor
                            
                                Unable to exhaust the content of all the identical urls used within my scraper
                            
                                How can I write to a png/tiff file patch-by-patch?
                            
                                Long running script from flask endpoint
                            
                                pandas df.apply unexpectedly changes dataframe inplace
                            
                                Python requests module hangs on socket connection but cURL works
                            
                                Should my soft-deletion-honouring model manager be my model's default manager?
                            
                                when to use numpy vs statistics modules
                            
                                Loss Function is decreasing but metric function remains constant?
                            
                                How to attach large files to an email using Python - Gmail API
                            
                                SQLAlchemy requires query to be aliased yet that alias is not used in the generated SQL
                            
                                Running py.test inside Dockerfile
                            
                                trying to create dynamic subdags from parent dag based on array of filenames

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Mixed effects logistic regression

Tags:

python

logistic-regression

mixed-models

YTD

People also ask

1 Answers

Hal Koons

Recent Activity

Donate For Us