Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

when to use min-max-scalar and standard-scalar

When it is referred to use min-max-scaler and when Standard Scalar. I think it depends on the data. Is there any features of data to look on to decide to go for which preprocessing method. I looked at the docs but can someone give me more insight into it.

like image 473
Akash Chandra Avatar asked Mar 21 '18 13:03

Akash Chandra


1 Answers

The scaling will indeed depend of the type of data that you will. For most cases, StandardScaler is the scaler of choice. If you know that you have some outliers, go for the RobustScaler.

Then, you deal with some features with a weird distribution like for instance the digits, it will not be the best to use these scalers. Indeed, on this dataset, there a lot of pixel at zero meaning that you have a pick at zero for this distribution involving that dividing by the std. dev. will not be beneficial. So basically when the distribution of a feature is far to be Normal then you need to take an alternative.

In the case of the digits, the MinMaxScaler is a much better choice. However, if you want to keep the zero at zeros (because you use sparse matrices), you will go for a MaxAbsScaler.

NB: also look at the QuantileTransformer and the PowerTransformer if you want a feature to follow a Normal/Uniform distribution whatever the original distribution was.

like image 106
glemaitre Avatar answered Sep 21 '22 08:09

glemaitre