We upgraded our sklearn from the old 0.13-git to 0.14.1, and find the performance of our logistic regression classifier changed quite a bit. The two classifiers trained with the same data have different coefficients, and thus often give different classification results. As an experiment I used 5 data points (high dimensional) to train the LR classifier, and the results are: 0.13-git: <pre class="prettyprint"><code>clf.fit(data_test.data, y) LogisticRegression(C=10, class_weight='auto', dual=False, fit_intercept=True, intercept_scaling=1, penalty='l2', tol=0.0001) np.sort(clf.coef_) array([[-0.12442518, -0.11137502, -0.11137502, ..., 0.05428562, 0.07329358, 0.08178794]]) </code></pre> 0.14.1: <pre class="prettyprint"><code>clf1.fit(data_test.data, y) LogisticRegression(C=10, class_weight='auto', dual=False, fit_intercept=True, intercept_scaling=1, penalty='l2', random_state=None, tol=0.0001) np.sort(clf1.coef_) array([[-0.11702073, -0.10505662, -0.10505662, ..., 0.05630517, 0.07651478, 0.08534311]]) </code></pre> I would say the difference is quite big, in the range of 10^(-2). Obviously the data I used here is not ideal, because the dimensionality of features is much bigger than the number of entries. However, it is often the case in practice too. Does it have something to do with feature selection? How can I make the results the same as before? I understand the new results are not necessarily worse than before, but now the focus is to make them as consistent as possible. Thanks.

From the release 0.13 changelog: <blockquote> Fixed <code>class_weight</code> support in svm.LinearSVC and linear_model.LogisticRegression by Andreas Müller. The meaning of <code>class_weight</code> was reversed as erroneously higher weight meant less positives of a given class in earlier releases. </blockquote> However, the update's description is for the version 0.13, not a higher version. You mention that you used the version <code>0.13-git</code>, maybe you used a pre-release of the version 0.13 where the feature was not edited: this way, the update could make sense relatively to your problem. By looking at your coefficients, they are lower in the new version, which makes a bit of sense with the update's description stating that weights were initially lowered. You might want to change your new <code>LogisticRegression(...)</code>'s parameters and try to adjust things a bit.

Different versions of sklearn give quite different training results

Tags:

python

scikit-learn

logistic-regression

We upgraded our sklearn from the old 0.13-git to 0.14.1, and find the performance of our logistic regression classifier changed quite a bit. The two classifiers trained with the same data have different coefficients, and thus often give different classification results.

As an experiment I used 5 data points (high dimensional) to train the LR classifier, and the results are:

0.13-git:

clf.fit(data_test.data, y)
LogisticRegression(C=10, class_weight='auto', dual=False, fit_intercept=True,
intercept_scaling=1, penalty='l2', tol=0.0001)
np.sort(clf.coef_)
array([[-0.12442518, -0.11137502, -0.11137502, ..., 0.05428562,
0.07329358, 0.08178794]])

0.14.1:

clf1.fit(data_test.data, y)
LogisticRegression(C=10, class_weight='auto', dual=False, fit_intercept=True,
intercept_scaling=1, penalty='l2', random_state=None, tol=0.0001)
np.sort(clf1.coef_)
array([[-0.11702073, -0.10505662, -0.10505662, ..., 0.05630517,
0.07651478, 0.08534311]])

I would say the difference is quite big, in the range of 10^(-2). Obviously the data I used here is not ideal, because the dimensionality of features is much bigger than the number of entries. However, it is often the case in practice too. Does it have something to do with feature selection? How can I make the results the same as before? I understand the new results are not necessarily worse than before, but now the focus is to make them as consistent as possible. Thanks.

357

asked Apr 18 '15 18:04

ymeng

1 Answers

From the release 0.13 changelog:

Fixed class_weight support in svm.LinearSVC and linear_model.LogisticRegression by Andreas Müller. The meaning of class_weight was reversed as erroneously higher weight meant less positives of a given class in earlier releases.

However, the update's description is for the version 0.13, not a higher version. You mention that you used the version 0.13-git, maybe you used a pre-release of the version 0.13 where the feature was not edited: this way, the update could make sense relatively to your problem.

By looking at your coefficients, they are lower in the new version, which makes a bit of sense with the update's description stating that weights were initially lowered.

You might want to change your new LogisticRegression(...)'s parameters and try to adjust things a bit.

167

answered Sep 29 '22 11:09

Guillaume Chevalier

Related questions
                            
                                Where should virtualenvs go in production?
                            
                                Cython compilation error with include_path as a keyword in cythonize
                            
                                How to implement an invite flow using django allauth for signup/signin?
                            
                                How do I control the module/name of a cython cdef class?
                            
                                Minimal example of wrapping C code with Cython- passing int and struct
                            
                                Cannot get logging work for Flask with gunicorn daemon mode
                            
                                Adding Per request Context to Logging in Python
                            
                                Does django csrf token must be unique on every request?
                            
                                Using pyKML to parse KML Document
                            
                                Django REST Framework: SlugRelatedField for indirectly-related attribute?
                            
                                Why there's the difference between creating class in python 2.7 and python 3.4 performance
                            
                                Python on android [duplicate]
                            
                                Storing dates with more-than-4-digits years
                            
                                what is the equivalent of a (python-)module in UML
                            
                                Getting certificate verify failed error with mechanize
                            
                                Pre-processing before digit recognition for NN & CNN trained with MNIST dataset
                            
                                How do I connect to a kerberos authenticated REST service in Python on Windows
                            
                                How to parameterize python unittest setUp method?
                            
                                Python UDP socket send bottleneck (slow/delays randomly)
                            
                                Prettify Jinja2 Template

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With