Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does scikit learn's fit_transform also transform my original dataframe?

I am using scikit learning's StandardScaler() and notice that after I apply a transform(xtrain) or fit_transform(xtrain), it also changes my xtrain dataframe. Is this supposed to happen? How can I avoid the StandardScaler from changing my dataframe? ( I have tried using copy=False)

xtrain.describe()    #everything ok here
scalar = StandardScaler()
xtrain2 = scalar.fit_transform(xtrain)   

At this stage, I would expect xtrain to NOT have changed while xtrain2 to be a scaled version of xtrain. But when I run describe() on the 2 dataframes, I see they are both the same and both have been scaled. Why is that?

I experience the same problem when I do:

scalekey = scalar.fit(xtrain)
xtrain2 = scalekey.transform(xtrain)
like image 746
Jason Avatar asked Sep 27 '22 10:09

Jason


1 Answers

You can take a copy and pass this in order to not modify your df:

xtrain2 = xtrain.copy()
scalar.fit_transform(xtrain2) 

The docs state that the default param for StandardScaler is that copy=True so it should not have modified your df.

like image 198
EdChum Avatar answered Oct 19 '22 23:10

EdChum