Features are usually normalized prior to classification.
L1 and L2 normalization are usually used in the literature.
Could anybody comment on the advantages of L2 norm (or L1 norm) compared to L1 norm (or L2 norm)?
L1-norm has the property of producing many coefficients with zero values or very small values with few large coefficients. Computational efficiency. L1-norm does not have an analytical solution, but L2-norm does. This allows the L2-norm solutions to be calculated computationally efficiently.
It may be defined as the normalization technique that modifies the dataset values in a way that in each row the sum of the squares will always be up to 1. It is also called least squares.
The L1 norm that is calculated as the sum of the absolute values of the vector. The L2 norm that is calculated as the square root of the sum of the squared vector values. The max norm that is calculated as the maximum vector values.
From a practical standpoint, L1 tends to shrink coefficients to zero whereas L2 tends to shrink coefficients evenly. L1 is therefore useful for feature selection, as we can drop any variables associated with coefficients that go to zero. L2, on the other hand, is useful when you have collinear/codependent features.
Advantages of L2 over L1 norm
Advantages of L1 over L2 norm
More sources:
The same question on Quora
Another one
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With