Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between Standard scaler and MinMaxScaler

What is the difference between MinMaxScaler() and StandardScaler().

mms = MinMaxScaler(feature_range = (0, 1)) (Used in a machine learning model)

sc = StandardScaler() (In another machine learning model they used standard-scaler and not min-max-scaler)

like image 842
Chakra Avatar asked Jul 09 '18 02:07

Chakra


4 Answers

MinMaxScaler(feature_range = (0, 1)) will transform each value in the column proportionally within the range [0,1]. Use this as the first scaler choice to transform a feature, as it will preserve the shape of the dataset (no distortion).

StandardScaler() will transform each value in the column to range about the mean 0 and standard deviation 1, ie, each value will be normalised by subtracting the mean and dividing by standard deviation. Use StandardScaler if you know the data distribution is normal.

If there are outliers, use RobustScaler(). Alternatively you could remove the outliers and use either of the above 2 scalers (choice depends on whether data is normally distributed)

Additional Note: If scaler is used before train_test_split, data leakage will happen. Do use scaler after train_test_split

like image 83
Black Raven Avatar answered Nov 13 '22 17:11

Black Raven


From ScikitLearn site:

StandardScaler removes the mean and scales the data to unit variance. However, the outliers have an influence when computing the empirical mean and standard deviation which shrink the range of the feature values as shown in the left figure below. Note in particular that because the outliers on each feature have different magnitudes, the spread of the transformed data on each feature is very different: most of the data lie in the [-2, 4] range for the transformed median income feature while the same data is squeezed in the smaller [-0.2, 0.2] range for the transformed number of households.

StandardScaler therefore cannot guarantee balanced feature scales in the presence of outliers.

MinMaxScaler rescales the data set such that all feature values are in the range [0, 1] as shown in the right panel below. However, this scaling compress all inliers in the narrow range [0, 0.005] for the transformed number of households.

like image 27
Simas Joneliunas Avatar answered Nov 13 '22 15:11

Simas Joneliunas


Many machine learning algorithms perform better when numerical input variables are scaled to a standard range. Scaling the data means it helps to Normalize the data within a particular range.

When MinMaxScaler is used the it is also known as Normalization and it transform all the values in range between (0 to 1) formula is x = [(value - min)/(Max- Min)]

StandardScaler comes under Standardization and its value ranges between (-3 to +3) formula is z = [(x - x.mean)/Std_deviation]

like image 25
Manoj Nahak Avatar answered Nov 13 '22 17:11

Manoj Nahak


Before implementing MinMaxScaler or Standard Scaler you should know about the distribution of your dataset.

StandardScaler rescales a dataset to have a mean of 0 and a standard deviation of 1. Standardization is very useful if data has varying scales and the algorithm assumption about data having a gaussian distribution.

Normalization or MinMaxScaler rescale a dataset so that each value fall between 0 and 1. It is useful when data has varying scales and the algorithm does not make assumptions about the distribution. It is a good technique when we did not know about the distribution of data or when we know the distribution is not gaussian.

like image 38
Shishu Kumar Choudhary Avatar answered Nov 13 '22 16:11

Shishu Kumar Choudhary