I have some dataset: titanic
Doing this in R
glm(Survived ~ Sex, titanic, family = "binomial")
I get
(Intercept)     SexMale 
1.124321   -2.477825
R takes survived as positive outcome.
But when I'm doing the same in Python
sm.formula.glm("Survived ~ Sex", family=sm.families.Binomial(), data=titanic).fit()
I get negative results: i.e. Python takes not survived as positive outcome.
How can I adjust Python's glm function behavior so it will return the same result as R does?
You just need to set your reference group to either male or female (depending on what you're interested in):
With a small test dataset in R, the code and model summary looks like this:
df <- data.frame(c(0,0,1,1,0), c("Male", "Female", "Female", "Male", "Male"))
colnames(df) <- c("Survived", "Sex")
model <- glm(Survived ~ Sex, data=df, family="binomial")
summary(model)
Output:
Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.084e-16  1.414e+00   0.000    1.000
SexMale     -6.931e-01  1.871e+00  -0.371    0.711
To get something similar in Python/statsmodels:
import pandas as pd
import statsmodels.api as sm
df = pd.DataFrame({"Survived": [0,0,1,1,0],
                   "Sex": ["Male", "Female", "Female", "Male", "Male"]})
model = sm.formula.glm("Survived ~ C(Sex, Treatment(reference='Female'))",
                       family=sm.families.Binomial(), data=df).fit()
print(model.summary())
Which will give:
                                                    coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------------------------------
Intercept                                      5.551e-16      1.414   3.93e-16      1.000      -2.772       2.772
C(Sex, Treatment(reference='Female'))[T.Male]    -0.6931      1.871     -0.371      0.711      -4.360       2.974
Notice the use of Treatment() to set the reference group. I've set it to Female in this case to match the R output, but with your dataset it might make more sense to use Male. Either way, its just an issue of being explicit about which group is used as reference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With