I noticed that sklearn's new CalibratedClassifierCV
seems to underperform the direct base_estimator
when the base_estimator
is GradientBoostingClassifer
, (I haven't tested other classifiers). Interestingly, if make_classification
's parameters are:
n_features = 10
n_informative = 3
n_classes = 2
then the CalibratedClassifierCV
seems to be the slight outperformer (log loss evaluation).
However, under the following classification data set the CalibratedClassifierCV
seems to generally be the underperformer:
from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation
# Build a classification task using 3 informative features
X, y = make_classification(n_samples=1000,
n_features=100,
n_informative=30,
n_redundant=0,
n_repeated=0,
n_classes=9,
random_state=0,
shuffle=False)
skf = cross_validation.StratifiedShuffleSplit(y, 5)
for train, test in skf:
X_train, X_test = X[train], X[test]
y_train, y_test = y[train], y[test]
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
clf_cv.fit(X_train, y_train)
probas_cv = clf_cv.predict_proba(X_test)
cv_score = log_loss(y_test, probas_cv)
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf.fit(X_train, y_train)
probas = clf.predict_proba(X_test)
clf_score = log_loss(y_test, probas)
print 'calibrated score:', cv_score
print 'direct clf score:', clf_score
print
One run yielded:
Maybe I'm missing something about how CalibratedClassifierCV
works, or am not using it correctly, but I was under the impression that if anything, passing a classifier to CalibratedClassifierCV
would result in improved performance relative to the base_estimator
alone.
Can anyone explain this observed underperformance?
CalibratedClassifierCV. Probability calibration with isotonic regression or logistic regression. This class uses cross-validation to both estimate the parameters of a classifier and subsequently calibrate a classifier.
Usage. The CalibratedClassifierCV class is used to calibrate a classifier. CalibratedClassifierCV uses a cross-validation approach to ensure unbiased data is always used to fit the calibrator. The data is split into k (train_set, test_set) couples (as determined by cv ).
Calibration is important, albeit often overlooked, aspect of training machine learning classifiers. It gives insight into model uncertainty, which can be later communicated to end-users or used in further processing of the model outputs.
A model has an accuracy of 70% with 0.7 confidence in each prediction = well calibrated. A model who has an accuracy of 70% with 0.9 confidence in each prediction = ill-calibrated.
An alternative might be to have a validation set Which has the disadvantage of leaving less data for training. Also, if CalibratedClassifierCV should only be fit on models fit on a different training set, why would it's default options be cv=3, which will also fit the base estimator?
The probability calibration itself requires cross-validation, therefore the CalibratedClassifierCV trains a calibrated classifier per fold (in this case using StratifiedKFold ), and takes the mean of the predicted probabilities from each classifier when you call predict_proba (). This could lead to the explanation of the effect.
Calibrating a classifier is as easy as passing it to scikit-learn’s CalibratedClassiferCV. The method argument can be either sigmoid (the default, for logistic regression a.k.a. Platt-scaling) or isotonic. Let’s now draw the calibration curve for this new, calibrated model on top of the previous one. There are two interesting things to see here:
There are two things mentioned in the CalibratedClassifierCV docs that hint towards the ways it can be used: base_estimator: If cv=prefit, the classifier must have been fit already on data. cv: If “prefit” is passed, it is assumed that base_estimator has been fitted already and all data is used for calibration.
The probability calibration itself requires cross-validation, therefore the CalibratedClassifierCV
trains a calibrated classifier per fold (in this case using StratifiedKFold
), and takes the mean of the predicted probabilities from each classifier when you call predict_proba(). This could lead to the explanation of the effect.
My hypothesis is that if the training set is small with respect to the number of features and classes, the reduced training set for each sub-classifier affects performance and the ensembling does not make up for it (or makes it worse). Also the GradientBoostingClassifier might provide already pretty good probability estimates from the start as its loss function is optimized for probability estimation.
If that's correct, ensembling classifiers the same way as the CalibratedClassifierCV but without calibration should be worse than the single classifier. Also, the effect should disappear when using a larger number of folds for calibration.
To test that, I extended your script to increase the number of folds and include the ensembled classifier without calibration, and I was able to confirm my predictions. A 10-fold calibrated classifier always performed better than the single classifier and the uncalibrated ensemble was significantly worse. In my run, the 3-fold calibrated classifier also did not really perform worse than the single classifier, so this might be also an unstable effect. These are the detailed results on the same dataset:
This is the code from my experiment:
import numpy as np
from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation
X, y = make_classification(n_samples=1000,
n_features=100,
n_informative=30,
n_redundant=0,
n_repeated=0,
n_classes=9,
random_state=0,
shuffle=False)
skf = cross_validation.StratifiedShuffleSplit(y, 5)
for train, test in skf:
X_train, X_test = X[train], X[test]
y_train, y_test = y[train], y[test]
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
clf_cv.fit(X_train, y_train)
probas_cv = clf_cv.predict_proba(X_test)
cv_score = log_loss(y_test, probas_cv)
print 'calibrated score (3-fold):', cv_score
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv = CalibratedClassifierCV(clf, cv=10, method='isotonic')
clf_cv.fit(X_train, y_train)
probas_cv = clf_cv.predict_proba(X_test)
cv_score = log_loss(y_test, probas_cv)
print 'calibrated score (10-fold:)', cv_score
#Train 3 classifiers and take average probability
skf2 = cross_validation.StratifiedKFold(y_test, 3)
probas_list = []
for sub_train, sub_test in skf2:
X_sub_train, X_sub_test = X_train[sub_train], X_train[sub_test]
y_sub_train, y_sub_test = y_train[sub_train], y_train[sub_test]
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf.fit(X_sub_train, y_sub_train)
probas_list.append(clf.predict_proba(X_test))
probas = np.mean(probas_list, axis=0)
clf_ensemble_score = log_loss(y_test, probas)
print 'uncalibrated ensemble clf (3-fold) score:', clf_ensemble_score
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf.fit(X_train, y_train)
probas = clf.predict_proba(X_test)
score = log_loss(y_test, probas)
print 'direct clf score:', score
print
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With