Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mixed effects logistic regression

I'm attempting to implement mixed effects logistic regression in python. As a point of comparison, I'm using the glmer function from the lme4 package in R.

I've found that the statsmodels module has a BinomialBayesMixedGLM that should be able to fit such a model. However, I've encountered a number of issues:

  1. I find the documentation for the statsmodels function to be not entirely helpful or clear, so I'm not completely sure how to use the function appropriately.
  2. So far, my attempts have not produced results that replicate what I get when fitting the model with glmer in R.
  3. I expect the BinomialBayesMixedGLM function does not calculate p-values since it is Bayesian, but I can't seem to figure out how to access the full posterior distributions for the parameters.

As a test case, I'm using the titanic dataset available here.

import os
import pandas as pd
import statsmodels.genmod.bayes_mixed_glm as smgb

titanic = pd.read_csv(os.path.join(os.getcwd(), 'titanic.csv'))

r = {"Pclass": '0 + Pclass'}
mod = smgb.BinomialBayesMixedGLM.from_formula('Survived ~ Age', r, titanic)
fit = mod.fit_map()
fit.summary()

#           Type    Post. Mean  Post. SD       SD SD (LB) SD (UB)
# Intercept M           3.1623    0.3616            
# Age       M          -0.0380    0.0061            
# Pclass    V           0.0754    0.5669    1.078   0.347   3.351

However, aside from the slope for Age, this doesn't appear to match what I get in R with glmer(Survived ~ Age + (1 | Pclass), data = titanic, family = "binomial"):

Random effects:
 Groups Name        Variance Std.Dev.
 Pclass (Intercept) 0.8563   0.9254  
Number of obs: 887, groups:  Pclass, 3

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.961780   0.573402   1.677   0.0935 .  
Age         -0.038708   0.006243  -6.200 5.65e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So what error am I making when creating my model in python? And, once that's addressed, how can I extract either the posteriors or p-values? Lastly, are their any python implementations of mixed effect logistic regressions that are more akin to the implementation in R?

like image 915
YTD Avatar asked Feb 25 '20 21:02

YTD


People also ask

What does mixed-effects model tell you?

A Mixed Effects Model is a statistical test used to predict a single variable using two or more other variables. It also is used to determine the numerical relationship between one variable and others. The variable you want to predict should be continuous and your data should meet the other assumptions listed below.

What is a mixed effects linear regression model?

Linear mixed models are an extension of simple linear models to allow both fixed and random effects, and are particularly used when there is non independence in the data, such as arises from a hierarchical structure.

What is random effects logistic regression?

Logistic regression with random effects is used to study the relationship between explanatory variables and a binary outcome in cases with nonindependent outcomes. In this paper, we examine in detail the interpretation of both fixed effects and random effects parameters.

Can you use logistic regression for repeated measures?

Mixed Effects Logistic Regression is sometimes also called Repeated Measures Logistic Regression, Multilevel Logistic Regression and Multilevel Binary Logistic Regression .


1 Answers

Just had to do something similar with Python, as suggested in the comments Pymer4 appeared to offer a suitable approach (especially if you are familiar with R anyway). Using the example dataset 'titanic' referred to in the question:

from pymer4.models import Lmer

model = Lmer("Survived  ~ Age  + (1|Pclass)",
             data=titanic, family = 'binomial')

print(model.fit())

OUT:

Formula: Survived~Age+(1|Pclass)

Family: binomial     Inference: parametric

Number of observations: 887  Groups: {'Pclass': 3.0}

Log-likelihood: -525.812     AIC: 1057.624

Random effects:

               Name    Var    Std
Pclass  (Intercept)  0.856  0.925

No random effect correlations specified

Fixed effects:

             Estimate  2.5_ci  97.5_ci     SE     OR  OR_2.5_ci  OR_97.5_ci  \
(Intercept)     0.962  -0.162    2.086  0.573  2.616       0.85       8.050   
Age            -0.039  -0.051   -0.026  0.006  0.962       0.95       0.974   

              Prob  Prob_2.5_ci  Prob_97.5_ci  Z-stat  P-val  Sig  
(Intercept)  0.723        0.460         0.889   1.677  0.093    .  
Age          0.490        0.487         0.493  -6.200  0.000  *** 

Just as an additional comment (sorry for diverting from the main question), I ran this on an Ubuntu 20.04 machine with Python 3.8.8. Not sure if anyone else encountered this problem, but when running the model above with Pymer4 the package threw an error (Same error when I tried to reproduce a similar model form the Pymer4 documentation here):

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

referring to this line in the /pymer4/models/Lmer.py package file:

--> 444         if design_matrix:

I fixed this by changing this to (not sure if this is the most elegant or safest approach, happy to be corrected here):

if design_matrix.any():

This appeared to get the package running and provide R-equivalent results in the few cases that I tested.

A better approach suggested in the comments below may be:

if design_matrix is not None:

thanks to Giacomo Petrillo for pointing that out, I have not tested this however.

like image 135
Hal Koons Avatar answered Oct 03 '22 05:10

Hal Koons