Why ColumnTransformer does not call fit on its transformers?

Question

I have defined data for fitting with one categorical feature "sex":

data = pd.DataFrame({
    'age': [25,19, 17],
    'sex': ['female', 'male', 'female'],
    'won_lottery': [False, True, False]
})
X = data[['age', 'sex']]
y = data['won_lottery']

and pipeline for transforming categorical features:

ohe = OneHotEncoder(handle_unknown='ignore')
cat_transformers = Pipeline([
    ('onehot', ohe)
])

When fitting cat_transformers with data directly

cat_transformers.fit(X[['sex']], y)
print(ohe.get_feature_names())

I am able to get names of output features created by OneHotEncoder instance:

['x0_female' 'x0_male']

However, if I encapsulate cat_transformers into ColumnTransformer:

preprocessor = ColumnTransformer(
    transformers=[
        ('cat', cat_transformers, ['sex'])
    ]
)
preprocessor.fit(X, y)
print(ohe.get_feature_names())

it fails with

sklearn.exceptions.NotFittedError: This OneHotEncoder instance is not fitted yet. 
  Call 'fit' with appropriate arguments before using this method.

I would expect that calling fit() on ColumnTransformer causes calling fit() on all its transformers.

Why it does not work this way?

dzieciou · Accepted Answer

Ok, I understand it now. I was fitting one instance of OneHotEncoder and checking features on another instance:

print(id(ohe))
print(id(preprocessor.named_transformers_['cat'].named_steps['onehot']))

2757198591872
2755226729104

It looks like ColumnTranformer clones its transformers before fitting.

Why ColumnTransformer does not call fit on its transformers?

Tags:

python

scikit-learn

dzieciou

1 Answers

dzieciou

Recent Activity

Donate For Us

Why ColumnTransformer does not call fit on its transformers?

Tags:

python

scikit-learn

dzieciou

1 Answers

dzieciou

Related questions

Recent Activity

Donate For Us