What's the right way to insert a CalibratedClassifierCV in a scikit-learn pipeline?

Question

I am trying to add a calibration step in a sklearn pipeline to obtain a calibrated classifier and thus have more trustworthy probabilities in output.

So far I clumsily tried to insert a 'calibration' step using CalibratedClassifierCV along the lines of (silly example for reproducibility):

import sklearn.datasets
import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.linear_model import SGDClassifier
from sklearn.feature_extraction.text import TfidfVectorizer

data = sklearn.datasets.fetch_20newsgroups(categories=['alt.atheism', 'sci.space'])
df = pd.DataFrame(data = np.c_[data['data'], data['target']])\
       .rename({0:'text', 1:'class'}, axis = 'columns')

my_pipeline = Pipeline([
    ('vectorizer', TfidfVectorizer()),
    ('classifier', SGDClassifier(loss='modified_huber')),
    ('calibrator', CalibratedClassifierCV(cv=5, method='isotonic'))
])

my_pipeline.fit(df['text'].values, df['class'].values)

but that doesn't work (at least not in this way). Does anyone have tips about how to properly do this?

Ami Tavory · Accepted Answer

The SGDClassifier object should go into the CalibratedClassifierCV's base_estimator argument.

Your code should probably look something like this:

my_pipeline = Pipeline([
    ('vectorizer', TfidfVectorizer()),
    ('classifier', CalibratedClassifierCV(base_estimator=SGDClassifier(loss='modified_huber'), cv=5, method='isotonic'))
])

CalibratedClassifierCV is a meta-estimator.

What's the right way to insert a CalibratedClassifierCV in a scikit-learn pipeline?

Tags:

python

pandas

scikit-learn

Davide Fiocco

1 Answers

Ami Tavory

Recent Activity

Donate For Us

What's the right way to insert a CalibratedClassifierCV in a scikit-learn pipeline?

Tags:

python

pandas

scikit-learn

Davide Fiocco

1 Answers

Ami Tavory

Related questions

Recent Activity

Donate For Us