We upgraded our sklearn from the old 0.13-git to 0.14.1, and find the performance of our logistic regression classifier changed quite a bit. The two classifiers trained with the same data have different coefficients, and thus often give different classification results.
As an experiment I used 5 data points (high dimensional) to train the LR classifier, and the results are:
0.13-git:
clf.fit(data_test.data, y)
LogisticRegression(C=10, class_weight='auto', dual=False, fit_intercept=True,
intercept_scaling=1, penalty='l2', tol=0.0001)
np.sort(clf.coef_)
array([[-0.12442518, -0.11137502, -0.11137502, ..., 0.05428562,
0.07329358, 0.08178794]])
0.14.1:
clf1.fit(data_test.data, y)
LogisticRegression(C=10, class_weight='auto', dual=False, fit_intercept=True,
intercept_scaling=1, penalty='l2', random_state=None, tol=0.0001)
np.sort(clf1.coef_)
array([[-0.11702073, -0.10505662, -0.10505662, ..., 0.05630517,
0.07651478, 0.08534311]])
I would say the difference is quite big, in the range of 10^(-2). Obviously the data I used here is not ideal, because the dimensionality of features is much bigger than the number of entries. However, it is often the case in practice too. Does it have something to do with feature selection? How can I make the results the same as before? I understand the new results are not necessarily worse than before, but now the focus is to make them as consistent as possible. Thanks.
Three types of Machine Learning Models can be implemented using the Sklearn Regression Models: Reinforced Learning. Unsupervised Learning. Supervised Learning.
Finally we will use three different algorithms (Naive-Bayes, LinearSVC, K-Neighbors Classifier) to make predictions and compare their performance using methods like accuracy_score() provided by the scikit-learn library.
Requirements for working with data in scikit learnFeatures = predictor variables = independent variables. Target variable = dependent variable = response variable. Samples=records=instances.
The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse).
From the release 0.13 changelog:
Fixed
class_weight
support in svm.LinearSVC and linear_model.LogisticRegression by Andreas Müller. The meaning ofclass_weight
was reversed as erroneously higher weight meant less positives of a given class in earlier releases.
However, the update's description is for the version 0.13, not a higher version. You mention that you used the version 0.13-git
, maybe you used a pre-release of the version 0.13 where the feature was not edited: this way, the update could make sense relatively to your problem.
By looking at your coefficients, they are lower in the new version, which makes a bit of sense with the update's description stating that weights were initially lowered.
You might want to change your new LogisticRegression(...)
's parameters and try to adjust things a bit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With