Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does CalibratedClassifierCV underperform a direct classifer?

I noticed that sklearn's new CalibratedClassifierCV seems to underperform the direct base_estimator when the base_estimator is GradientBoostingClassifer, (I haven't tested other classifiers). Interestingly, if make_classification's parameters are:

n_features = 10
n_informative = 3
n_classes = 2

then the CalibratedClassifierCV seems to be the slight outperformer (log loss evaluation).

However, under the following classification data set the CalibratedClassifierCV seems to generally be the underperformer:

from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation
# Build a classification task using 3 informative features

X, y = make_classification(n_samples=1000,
                           n_features=100,
                           n_informative=30,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=9,
                           random_state=0,
                           shuffle=False)

skf = cross_validation.StratifiedShuffleSplit(y, 5)

for train, test in skf:

    X_train, X_test = X[train], X[test]
    y_train, y_test = y[train], y[test]

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
    clf_cv.fit(X_train, y_train)
    probas_cv = clf_cv.predict_proba(X_test)
    cv_score = log_loss(y_test, probas_cv)

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf.fit(X_train, y_train)
    probas = clf.predict_proba(X_test)
    clf_score = log_loss(y_test, probas) 

    print 'calibrated score:', cv_score
    print 'direct clf score:', clf_score
    print

One run yielded:

enter image description here

Maybe I'm missing something about how CalibratedClassifierCV works, or am not using it correctly, but I was under the impression that if anything, passing a classifier to CalibratedClassifierCV would result in improved performance relative to the base_estimator alone.

Can anyone explain this observed underperformance?

like image 949
Ryan Avatar asked May 17 '15 09:05

Ryan


People also ask

Why do we use CalibratedClassifierCV?

CalibratedClassifierCV. Probability calibration with isotonic regression or logistic regression. This class uses cross-validation to both estimate the parameters of a classifier and subsequently calibrate a classifier.

What is CalibratedClassifierCV?

Usage. The CalibratedClassifierCV class is used to calibrate a classifier. CalibratedClassifierCV uses a cross-validation approach to ensure unbiased data is always used to fit the calibrator. The data is split into k (train_set, test_set) couples (as determined by cv ).

Why is calibration important in machine learning?

Calibration is important, albeit often overlooked, aspect of training machine learning classifiers. It gives insight into model uncertainty, which can be later communicated to end-users or used in further processing of the model outputs.

What is a well calibrated model?

A model has an accuracy of 70% with 0.7 confidence in each prediction = well calibrated. A model who has an accuracy of 70% with 0.9 confidence in each prediction = ill-calibrated.

Is there an alternative to calibratedclassifiercv CV=3?

An alternative might be to have a validation set Which has the disadvantage of leaving less data for training. Also, if CalibratedClassifierCV should only be fit on models fit on a different training set, why would it's default options be cv=3, which will also fit the base estimator?

How does the calibratedclassifiercv handle cross-validation?

The probability calibration itself requires cross-validation, therefore the CalibratedClassifierCV trains a calibrated classifier per fold (in this case using StratifiedKFold ), and takes the mean of the predicted probabilities from each classifier when you call predict_proba (). This could lead to the explanation of the effect.

How to calibrate a classifier?

Calibrating a classifier is as easy as passing it to scikit-learn’s CalibratedClassiferCV. The method argument can be either sigmoid (the default, for logistic regression a.k.a. Platt-scaling) or isotonic. Let’s now draw the calibration curve for this new, calibrated model on top of the previous one. There are two interesting things to see here:

How do I use the base_estimator in calibratedclassifiercv?

There are two things mentioned in the CalibratedClassifierCV docs that hint towards the ways it can be used: base_estimator: If cv=prefit, the classifier must have been fit already on data. cv: If “prefit” is passed, it is assumed that base_estimator has been fitted already and all data is used for calibration.


1 Answers

The probability calibration itself requires cross-validation, therefore the CalibratedClassifierCV trains a calibrated classifier per fold (in this case using StratifiedKFold), and takes the mean of the predicted probabilities from each classifier when you call predict_proba(). This could lead to the explanation of the effect.

My hypothesis is that if the training set is small with respect to the number of features and classes, the reduced training set for each sub-classifier affects performance and the ensembling does not make up for it (or makes it worse). Also the GradientBoostingClassifier might provide already pretty good probability estimates from the start as its loss function is optimized for probability estimation.

If that's correct, ensembling classifiers the same way as the CalibratedClassifierCV but without calibration should be worse than the single classifier. Also, the effect should disappear when using a larger number of folds for calibration.

To test that, I extended your script to increase the number of folds and include the ensembled classifier without calibration, and I was able to confirm my predictions. A 10-fold calibrated classifier always performed better than the single classifier and the uncalibrated ensemble was significantly worse. In my run, the 3-fold calibrated classifier also did not really perform worse than the single classifier, so this might be also an unstable effect. These are the detailed results on the same dataset:

Log-loss results from cross-validation

This is the code from my experiment:

import numpy as np
from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation

X, y = make_classification(n_samples=1000,
                           n_features=100,
                           n_informative=30,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=9,
                           random_state=0,
                           shuffle=False)

skf = cross_validation.StratifiedShuffleSplit(y, 5)

for train, test in skf:

    X_train, X_test = X[train], X[test]
    y_train, y_test = y[train], y[test]

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
    clf_cv.fit(X_train, y_train)
    probas_cv = clf_cv.predict_proba(X_test)
    cv_score = log_loss(y_test, probas_cv)
    print 'calibrated score (3-fold):', cv_score


    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf_cv = CalibratedClassifierCV(clf, cv=10, method='isotonic')
    clf_cv.fit(X_train, y_train)
    probas_cv = clf_cv.predict_proba(X_test)
    cv_score = log_loss(y_test, probas_cv)
    print 'calibrated score (10-fold:)', cv_score

    #Train 3 classifiers and take average probability
    skf2 = cross_validation.StratifiedKFold(y_test, 3)
    probas_list = []
    for sub_train, sub_test in skf2:
        X_sub_train, X_sub_test = X_train[sub_train], X_train[sub_test]
        y_sub_train, y_sub_test = y_train[sub_train], y_train[sub_test]
        clf = ensemble.GradientBoostingClassifier(n_estimators=100)
        clf.fit(X_sub_train, y_sub_train)
        probas_list.append(clf.predict_proba(X_test))
    probas = np.mean(probas_list, axis=0)
    clf_ensemble_score = log_loss(y_test, probas)
    print 'uncalibrated ensemble clf (3-fold) score:', clf_ensemble_score

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf.fit(X_train, y_train)
    probas = clf.predict_proba(X_test)
    score = log_loss(y_test, probas)
    print 'direct clf score:', score
    print
like image 100
Alexander Bauer Avatar answered Oct 05 '22 07:10

Alexander Bauer