Why does CalibratedClassifierCV underperform a direct classifer?

Tags:

I noticed that sklearn's new CalibratedClassifierCV seems to underperform the direct base_estimator when the base_estimator is GradientBoostingClassifer, (I haven't tested other classifiers). Interestingly, if make_classification's parameters are:

n_features = 10
n_informative = 3
n_classes = 2

then the CalibratedClassifierCV seems to be the slight outperformer (log loss evaluation).

However, under the following classification data set the CalibratedClassifierCV seems to generally be the underperformer:

from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation
# Build a classification task using 3 informative features

X, y = make_classification(n_samples=1000,
                           n_features=100,
                           n_informative=30,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=9,
                           random_state=0,
                           shuffle=False)

skf = cross_validation.StratifiedShuffleSplit(y, 5)

for train, test in skf:

    X_train, X_test = X[train], X[test]
    y_train, y_test = y[train], y[test]

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
    clf_cv.fit(X_train, y_train)
    probas_cv = clf_cv.predict_proba(X_test)
    cv_score = log_loss(y_test, probas_cv)

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf.fit(X_train, y_train)
    probas = clf.predict_proba(X_test)
    clf_score = log_loss(y_test, probas) 

    print 'calibrated score:', cv_score
    print 'direct clf score:', clf_score
    print

One run yielded:

enter image description here

Maybe I'm missing something about how CalibratedClassifierCV works, or am not using it correctly, but I was under the impression that if anything, passing a classifier to CalibratedClassifierCV would result in improved performance relative to the base_estimator alone.

Can anyone explain this observed underperformance?

949

asked May 17 '15 09:05

Ryan

1 Answers

The probability calibration itself requires cross-validation, therefore the CalibratedClassifierCV trains a calibrated classifier per fold (in this case using StratifiedKFold), and takes the mean of the predicted probabilities from each classifier when you call predict_proba(). This could lead to the explanation of the effect.

My hypothesis is that if the training set is small with respect to the number of features and classes, the reduced training set for each sub-classifier affects performance and the ensembling does not make up for it (or makes it worse). Also the GradientBoostingClassifier might provide already pretty good probability estimates from the start as its loss function is optimized for probability estimation.

If that's correct, ensembling classifiers the same way as the CalibratedClassifierCV but without calibration should be worse than the single classifier. Also, the effect should disappear when using a larger number of folds for calibration.

To test that, I extended your script to increase the number of folds and include the ensembled classifier without calibration, and I was able to confirm my predictions. A 10-fold calibrated classifier always performed better than the single classifier and the uncalibrated ensemble was significantly worse. In my run, the 3-fold calibrated classifier also did not really perform worse than the single classifier, so this might be also an unstable effect. These are the detailed results on the same dataset:

Log-loss results from cross-validation

This is the code from my experiment:

import numpy as np
from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation

X, y = make_classification(n_samples=1000,
                           n_features=100,
                           n_informative=30,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=9,
                           random_state=0,
                           shuffle=False)

skf = cross_validation.StratifiedShuffleSplit(y, 5)

for train, test in skf:

    X_train, X_test = X[train], X[test]
    y_train, y_test = y[train], y[test]

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
    clf_cv.fit(X_train, y_train)
    probas_cv = clf_cv.predict_proba(X_test)
    cv_score = log_loss(y_test, probas_cv)
    print 'calibrated score (3-fold):', cv_score


    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf_cv = CalibratedClassifierCV(clf, cv=10, method='isotonic')
    clf_cv.fit(X_train, y_train)
    probas_cv = clf_cv.predict_proba(X_test)
    cv_score = log_loss(y_test, probas_cv)
    print 'calibrated score (10-fold:)', cv_score

    #Train 3 classifiers and take average probability
    skf2 = cross_validation.StratifiedKFold(y_test, 3)
    probas_list = []
    for sub_train, sub_test in skf2:
        X_sub_train, X_sub_test = X_train[sub_train], X_train[sub_test]
        y_sub_train, y_sub_test = y_train[sub_train], y_train[sub_test]
        clf = ensemble.GradientBoostingClassifier(n_estimators=100)
        clf.fit(X_sub_train, y_sub_train)
        probas_list.append(clf.predict_proba(X_test))
    probas = np.mean(probas_list, axis=0)
    clf_ensemble_score = log_loss(y_test, probas)
    print 'uncalibrated ensemble clf (3-fold) score:', clf_ensemble_score

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf.fit(X_train, y_train)
    probas = clf.predict_proba(X_test)
    score = log_loss(y_test, probas)
    print 'direct clf score:', score
    print

100

answered Oct 05 '22 07:10

Alexander Bauer

Related questions
                            
                                Strange behaviour with floats and string conversion
                            
                                Generate python bindings, what methods/programs to use [closed]
                            
                                Python - Find text using beautifulSoup then replace in original soup variable
                            
                                how to check which compiler was used to build Python
                            
                                Stopping Supervisor doesn't stop Celery workers
                            
                                What is the use of __kwdefaults__ which is a function object attribute?
                            
                                Sharing a lock between gunicorn workers
                            
                                Setting DataFrame column headers to a MultiIndex
                            
                                Django: list all reverse relations of a model
                            
                                remove italics in latex subscript in matplotlib
                            
                                fixing words with spaces using a dictionary look up in python?
                            
                                How to place minor ticks on symlog scale?
                            
                                Where in flask/gunicorn to initialize application
                            
                                Use cases for property vs. descriptor vs. __getattribute__
                            
                                Get count of related model efficiently in Django
                            
                                How to eliminate the extra minus sign when rounding negative numbers towards zero in numpy?
                            
                                Find out which font matplotlib uses
                            
                                Why does PyMongo throw AutoReconnect?
                            
                                Pandas MultiIndex: Divide all columns by one column
                            
                                Clustering cosine similarity matrix

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does CalibratedClassifierCV underperform a direct classifer?

Tags:

python

scikit-learn

Ryan

People also ask

1 Answers

Alexander Bauer

Recent Activity

Donate For Us