Does the pipeline object in sklearn transform the test data when using the .predict() method?

Question

When I use a pipeline object,

Does the pipeline object fit and transform the train data when I use the .fit() method? Or should I use the .fit_transform() method? What is the difference between the two?
When I use the .predict() method on the test data, does the pipeline object transform the test data and only then predict it? That is, should I transform the test data using the .transform() method before I use the .predict() method?

This is the code I have:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.decomposition import PCA
from sklearn.tree import DecisionTreeClassifier



#creating some data
X, y = np.ones((50, 1)), np.hstack(([0] * 45, [1] * 5))

#creating the pipeline
steps = [('scaler', StandardScaler()), ('SelectKBest', SelectKBest(f_classif, k=3)), ('pca', PCA(n_components=2)), ('DT', DecisionTreeClassifier(random_state=0))]
model = Pipeline(steps=steps)

#splitting the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42)

model.fit(X_train,y_train)

model.predict(X_test)

Antoine Dubuis · Accepted Answer

The Pipeline object is exposing the methods of its last step. As your last step is a DecisionTreeClassifier (an Estimator), the pipeline will not have a fit_transform() but estimator functions such as fit(), predict(), score() etc.

When using fit(), the pipeline will call fit_transform() on all transformer and finally a fit() on the estimator.

When using predict() the pipeline will transform() all the data and then call predict() on the estimator.

As depicted in this picture: (Image from Raschka, Sebastian. Python machine learning. Birmingham, UK: Packt Publishing, 2015. Print)

enter image description here

Does the pipeline object in sklearn transform the test data when using the .predict() method?

Tags:

python

machine-learning

scikit-learn

user42

1 Answers

Antoine Dubuis

Recent Activity

Donate For Us

Does the pipeline object in sklearn transform the test data when using the .predict() method?

Tags:

python

machine-learning

scikit-learn

user42

1 Answers

Antoine Dubuis

Related questions

Recent Activity

Donate For Us