What is the difference between standardscaler and normalizer in sklearn.preprocessing module? Don't both do the same thing? i.e remove mean and scale using deviation?
Normalize samples individually to unit norm. Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1, l2 or inf) equals one. This transformer is able to work both with dense numpy arrays and scipy.
Standardize features by removing the mean and scaling to unit variance. where u is the mean of the training samples or zero if with_mean=False , and s is the standard deviation of the training samples or one if with_std=False .
Thus, StandardScaler() will normalize the features i.e. each column of X, INDIVIDUALLY so that each column/feature/variable will have μ = 0 and σ = 1 .
From the Normalizer docs:
Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.
And StandardScaler
Standardize features by removing the mean and scaling to unit variance
In other words Normalizer acts row-wise and StandardScaler column-wise. Normalizer does not remove the mean and scale by deviation but scales the whole row to unit norm.
This visualization and article by Ben helps a lot in illustrating the idea.
The StandardScaler assumes your data is normally distributed within each feature. By "removing the mean and scaling to unit variance", you can see in the picture now they have the same "scale" regardless of its original one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With