Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do standardscaler and normalizer need different data input?

I was trying the following code and found that StandardScaler(or MinMaxScaler) and Normalizer from sklearn handle data very differently. This issue makes the pipeline construction more difficult. I was wondering if this design discrepancy is intentional or not.

from sklearn.preprocessing import StandardScaler, Normalizer, MinMaxScaler

For Normalizer, the data is read "horizontally".

Normalizer(norm = 'max').fit_transform([[ 1., 1.,  2., 10],
                                        [ 2.,  0.,  0., 100],
                                        [ 0.,  -1., -1., 1000]])
#array([[ 0.1  ,  0.1  ,  0.2  ,  1.   ],
#       [ 0.02 ,  0.   ,  0.   ,  1.   ],
#       [ 0.   , -0.001, -0.001,  1.   ]])

For StandardScaler and MinMaxScaler, the data is read "vertically".

StandardScaler().fit_transform([[ 1., 1.,  2., 10],
                                [ 2.,  0.,  0., 100],
                                [ 0.,  -1., -1., 1000]])
#array([[ 0.        ,  1.22474487,  1.33630621, -0.80538727],
#       [ 1.22474487,  0.        , -0.26726124, -0.60404045],
#       [-1.22474487, -1.22474487, -1.06904497,  1.40942772]])

MinMaxScaler().fit_transform([[ 1., 1.,  2., 10],
                              [ 2.,  0.,  0., 100],
                              [ 0.,  -1., -1., 1000]])
#array([[0.5       , 1.        , 1.        , 0.        ],
#       [1.        , 0.5       , 0.33333333, 0.09090909],
#       [0.        , 0.        , 0.        , 1.        ]])
like image 647
user2547924 Avatar asked May 29 '26 12:05

user2547924


1 Answers

This is expected behavior, because StandardScaler and Normalizer serve different purposes. The StandardScaler works 'vertically', because it...

Standardize[s] features by removing the mean and scaling to unit variance

[...] Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using the transform method.

while the Normalizer works 'horizontally', because it...

Normalize[s] samples individually to unit norm.

Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.

Please have a look at the scikit-learn docs (links above), to get more insight, which serves your purpose better.

like image 68
rvf Avatar answered Jun 02 '26 05:06

rvf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!