Different coefficients: scikit-learn vs statsmodels (logistic regression)

Tags:

When running a logistic regression, the coefficients I get using statsmodels are correct (verified them with some course material). However, I am unable to get the same coefficients with sklearn. I've tried preprocessing the data to no avail. This is my code:

Statsmodels:

import statsmodels.api as sm

X_const = sm.add_constant(X)
model = sm.Logit(y, X_const)
results = model.fit()
print(results.summary())

The relevant output is:

                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const      -0.2382      3.983     -0.060      0.952      -8.045       7.569
a           2.0349      0.837      2.430      0.015       0.393       3.676
b           0.8077      0.823      0.981      0.327      -0.806       2.421
c           1.4572      0.768      1.897      0.058      -0.049       2.963
d          -0.0522      0.063     -0.828      0.407      -0.176       0.071
e_2         0.9157      1.082      0.846      0.397      -1.205       3.037
e_3         2.0080      1.052      1.909      0.056      -0.054       4.070

Scikit-learn (no preprocessing)

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
results = model.fit(X, y)
print(results.coef_)
print(results.intercept_)

The coefficients given are:

array([[ 1.29779008,  0.56524976,  0.97268593, -0.03762884,  0.33646097,
     0.98020901]])

And the intercept/constant given is:

array([ 0.0949539])

As you can see, regardless of which coefficient corresponds to which variable, the numbers given by sklearn don't match the correct ones from statsmodels. What am I missing? Thanks in advance!

468

asked May 19 '18 19:05

lfo

1 Answers

Thanks to a kind soul on reddit, this was solved. To get the same coefficients, one has to negate the regularisation that sklearn applies to logistic regression by default:

model = LogisticRegression(C=1e8)

Where C according to the documentation is:

C : float, default: 1.0

Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

195

answered Sep 18 '22 12:09

lfo

Related questions
                            
                                Google colaboratory run code locally
                            
                                Change training dataset every N epochs in Keras
                            
                                Activating python virtual environment does not switch to local versions of pip and python commands
                            
                                Is it possible to run multiple instances of one selenium test at once?
                            
                                Why python broadcasting in the example below is slower than a simple loop?
                            
                                Removing the white border around an image when using matplotlib without saving the image
                            
                                _pickle.PicklingError: Could not serialize object: TypeError: can't pickle _thread.RLock objects
                            
                                Efficiently find overlap of date-time ranges from 2 dataframes
                            
                                Numpy: conditional sum
                            
                                Understanding memory usage in python
                            
                                Is there any way to stop training a model in Keras after a certain accuracy has been achieved?
                            
                                When is the task at `create_task()` executed in asyncio?
                            
                                Pandas vs JSON library to read a JSON file in Python
                            
                                Display python plotly graph in RMarkdown html document
                            
                                Tensorflow Eager and Tensorboard Graphs?
                            
                                How to import a .pyd file as a python module?
                            
                                Having three points on 3 images from 3 viewpoints how to get its coordinates in 3d space?
                            
                                How to cycle through both colours and linestyles on a matplotlib figure?
                            
                                Selenium Python getting <script> tag information?
                            
                                apply a function to each row of the dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Different coefficients: scikit-learn vs statsmodels (logistic regression)

Tags:

python

scikit-learn

logistic-regression

statsmodels

lfo

People also ask

1 Answers

lfo

Recent Activity

Donate For Us