Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does calling transform() on test data return an error that the data is not fitted yet?

While performing feature scaling, instead of assigning a variable to StandardScaler(), when coded like this:

from sklearn.preprocessing import StandardScaler

x_train = StandardScaler().fit_transform(x_train)

x_test = StandardScaler().transform(x_test)

It gives the following error:

NotFittedError: This StandardScaler instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

whereas, following code works fine (after giving an identifier to StandardScaler()):

from sklearn.preprocessing import StandardScaler

sc_x = StandardScaler()

x_train = sc_x.fit_transform(x_train)

x_test = sc_x.transform(x_test)

Here, x_train is the training dataset and x_test is the test dataset.

Can someone please explain that and why is it happening?

like image 811
keenlearner Avatar asked Oct 16 '25 17:10

keenlearner


1 Answers

When you call StandardScaler(), you create a new (a.k.a. unfitted) object of the standscaler class. If you want to use it, you have to fit it before you can transform any data with it.

What you "told" the code to do was (pseudocode):

  1. Create a new scaler object
  2. Fit it to your training data
  3. Create another new scaler object
  4. Don't fit it to anything, but use it to transform some data

In the secod example, you created a single scaler object, fitted it to your data, then used the same object to transform your test data (which is the correct method to use)

like image 195
G. Anderson Avatar answered Oct 18 '25 07:10

G. Anderson