Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the right way to insert a CalibratedClassifierCV in a scikit-learn pipeline?

I am trying to add a calibration step in a sklearn pipeline to obtain a calibrated classifier and thus have more trustworthy probabilities in output.

So far I clumsily tried to insert a 'calibration' step using CalibratedClassifierCV along the lines of (silly example for reproducibility):

import sklearn.datasets
import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.linear_model import SGDClassifier
from sklearn.feature_extraction.text import TfidfVectorizer

data = sklearn.datasets.fetch_20newsgroups(categories=['alt.atheism', 'sci.space'])
df = pd.DataFrame(data = np.c_[data['data'], data['target']])\
       .rename({0:'text', 1:'class'}, axis = 'columns')

my_pipeline = Pipeline([
    ('vectorizer', TfidfVectorizer()),
    ('classifier', SGDClassifier(loss='modified_huber')),
    ('calibrator', CalibratedClassifierCV(cv=5, method='isotonic'))
])

my_pipeline.fit(df['text'].values, df['class'].values)

but that doesn't work (at least not in this way). Does anyone have tips about how to properly do this?

like image 992
Davide Fiocco Avatar asked Apr 14 '18 15:04

Davide Fiocco


1 Answers

The SGDClassifier object should go into the CalibratedClassifierCV's base_estimator argument.

Your code should probably look something like this:

my_pipeline = Pipeline([
    ('vectorizer', TfidfVectorizer()),
    ('classifier', CalibratedClassifierCV(base_estimator=SGDClassifier(loss='modified_huber'), cv=5, method='isotonic'))
])

CalibratedClassifierCV is a meta-estimator.

like image 192
Ami Tavory Avatar answered Sep 22 '22 20:09

Ami Tavory