sklearn.linear_model.LogisticRegression returns different coefficients every time although random_state is set

Tags:

I'm fitting a logistic regression model and am setting the random state to a fixed value.

Every time I do a "fit" I get different coefficients, example:

classifier_instance.fit(train_examples_features, train_examples_labels)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, penalty='l2', random_state=1, tol=0.0001)

>>> classifier_instance.raw_coef_
array([[ 0.071101940040772596  ,  0.05143724979709707323,  0.071101940040772596  , -0.04089477198935181912, -0.0407380696457252528 ,  0.03622160087086594843,  0.01055345545606742319,
         0.01071861708285645406, -0.36248634699444892693, -0.06159019047096317423,  0.02370064668025737009,  0.02370064668025737009, -0.03159781822495803805,  0.11221150783553821006,
         0.02728295348681779309,  0.071101940040772596  ,  0.071101940040772596  ,  0.                    ,  0.10882033432637286396,  0.64630314505709030026,  0.09617956519989406816,
         0.0604133873444507169 ,  0.                    ,  0.04111685986987245051,  0.                    ,  0.                    ,  0.18312324521915510078,  0.071101940040772596  ,
         0.071101940040772596  ,  0.                    , -0.59561802045324663268, -0.61490898457874587635,  1.07812569991461248975,  0.071101940040772596  ]])

classifier_instance.fit(train_examples_features, train_examples_labels)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, penalty='l2', random_state=1, tol=0.0001)

>>> classifier_instance.raw_coef_
array([[ 0.07110193825129411394,  0.05143724970282205489,  0.07110193825129411394, -0.04089477178162870957, -0.04073806899140903354,  0.03622160048165772028,  0.010553455400928528  ,
         0.01071860364222424096, -0.36248635488413910588, -0.06159021545062405567,  0.02370064608376460866,  0.02370064608376460866, -0.03159783710841745225,  0.11221149816037970237,
         0.02728295411479400578,  0.07110193825129411394,  0.07110193825129411394,  0.                    ,  0.10882033461822394893,  0.64630314701686075729,  0.09617956493834901865,
         0.06041338563697066372,  0.                    ,  0.04111676713793514099,  0.                    ,  0.                    ,  0.18312324401049043243,  0.07110193825129411394,
         0.07110193825129411394,  0.                    , -0.59561803345113684127, -0.61490899867901249731,  1.07812569539027203191,  0.07110193825129411394]])

I'm using version 0.14, the docs specify "The underlying C implementation uses a random number generator to select features when fitting the model. It is thus not uncommon, to have slightly different results for the same input data. If that happens, try with a smaller tol parameter."

I thought that setting the random state would make sure there is no randomness but apparently this is not the case. Is this a bug or desired behavior?

433

asked Jun 26 '14 07:06

jonathans

2 Answers

It's not really desired, but it's a known issue that is very hard to fix. The thing is that LogisticRegression models are trained with Liblinear, which does not allow setting its random seed in a completely robust way. When you explicitly set the random_state, a best effort is made to set Liblinear's random seed, but that may fail.

answered Sep 28 '22 21:09

Fred Foo

I was baffled by the problem as well, but eventually found that it was also necessary to call numpy.random.seed() to set the state of numpy's internal RNG, in addition to passing random_state.

This was tested with sklearn 0.13.1.

answered Sep 28 '22 20:09

Marcus Gröber

Related questions
                            
                                Filtering in django rest framework
                            
                                "shebang /usr/bin/env python" invoking the wrong Python interpreter
                            
                                How to execute multiple tasks in parallel in fabric
                            
                                ctypes - references from C to python objects
                            
                                iPython: cannot import module named sklearn
                            
                                How do I debug a 'Not all temporary messages could be stored' value error in django?
                            
                                Name of variable in python and program efficiency
                            
                                Is there a way to access PythonAnywhere CPU Allowance from a script?
                            
                                PyPy file append mode
                            
                                minimizing a multivariate, differentiable function using scipy.optimize
                            
                                signal handling pika / python
                            
                                Removing consecutive occurrences from end of list python
                            
                                What is a good way to support Python 2 in a Python 3 codebase when using PyPi?
                            
                                Iterating over partitions in Python
                            
                                View runs in split mode in PyCharm
                            
                                GTK3 Dialog in Python, "enter key" on a Gtk.Entry should trigger the OK Button
                            
                                How to use mysql.connection db pool with python flask
                            
                                Django loggers - Correct output to stdout and stderr
                            
                                IPython notebook with optirun
                            
                                Pandas read_csv import results in error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

sklearn.linear_model.LogisticRegression returns different coefficients every time although random_state is set

Tags:

python

scikit-learn

logistic-regression

jonathans

People also ask

2 Answers

Fred Foo

Marcus Gröber

Recent Activity

Donate For Us