Do I need to scale test data and Dependent variable in the train data?

Question

I am new to the concept of scaling a feature in Machine Learning, I read that scaling will be useful when one feature range is very high when compared to other features. But if I choose to scale the training data then:

Can I just scale that one feature that has high range?
If I scale the entire X of train data then do I need to also scale the y of train data and entire test data?

im_w0lf · Accepted Answer

Yes, you can scale that one feature that has high range, but do ensure that there is no other feature that has a high range, because if it exist and has not been scaled then that feature will make the algorithm overlook the contributions of the scaled features and effect the result(output value) with even a slight change in it. It is recommended( but not compulsory) to scale all the features in the training set.
You do not need to scale the Y of train data as the algorithm or model will set the parameter values to get least Cost(error), that is k{Y(output)-Y(original)} anyway. But if the Xtrain was scaled then the test set(feature values, Xtest)(Scale Ytest only if the Ytrain was scaled) needs to be scaled(using training mean and variance) before feeding it to the model because the model hasn't seen this data before and has been trained on data with scaled range, so if the test data has a feature value diverging from the corresponding feature range in train data by a considerably high value then the model will output a wrong prediction for the corresponding test data.

Do I need to scale test data and Dependent variable in the train data?

Tags:

python

machine-learning

scikit-learn

learncode

1 Answers

im_w0lf

Recent Activity

Donate For Us

Do I need to scale test data and Dependent variable in the train data?

Tags:

python

machine-learning

scikit-learn

learncode

1 Answers

im_w0lf

Related questions

Recent Activity

Donate For Us