Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

feature normalization- advantage of l2 normalization

Features are usually normalized prior to classification.

L1 and L2 normalization are usually used in the literature.

Could anybody comment on the advantages of L2 norm (or L1 norm) compared to L1 norm (or L2 norm)?

like image 848
user570593 Avatar asked Aug 28 '15 17:08

user570593


People also ask

Why is L2 norm better than L1 norm?

L1-norm has the property of producing many coefficients with zero values or very small values with few large coefficients. Computational efficiency. L1-norm does not have an analytical solution, but L2-norm does. This allows the L2-norm solutions to be calculated computationally efficiently.

What is L2 normalization?

It may be defined as the normalization technique that modifies the dataset values in a way that in each row the sum of the squares will always be up to 1. It is also called least squares.

What is meant by normalization What is the difference between L1 and L2 normalization?

The L1 norm that is calculated as the sum of the absolute values of the vector. The L2 norm that is calculated as the square root of the sum of the squared vector values. The max norm that is calculated as the maximum vector values.

Which is better L1 or L2 regularization?

From a practical standpoint, L1 tends to shrink coefficients to zero whereas L2 tends to shrink coefficients evenly. L1 is therefore useful for feature selection, as we can drop any variables associated with coefficients that go to zero. L2, on the other hand, is useful when you have collinear/codependent features.


1 Answers

Advantages of L2 over L1 norm

  • As already stated by aleju in the comments, derivations of the L2 norm are easily computed. Therefore it is also easy to use gradient based learning methods.
  • L2 regularization optimizes the mean cost (whereas L1 reduces the median explanation) which is often used as a performance measurement. This is especially good if you know you don't have any outliers and you want to keep the overall error small.
  • The solution is more likely to be unique. This ties in with the previous point: While the mean is a single value, the median might be located in an interval between two points and is therefore not unique.
  • While L1 regularization can give you a sparse coefficient vector, the non-sparseness of L2 can improve your prediction performance (since you leverage more features instead of simply ignoring them).
  • L2 is invariant under rotation. If you have a dataset consisting of points in a space and you apply a rotation, you still get the same results (i.e. the distances between points remain the same).

Advantages of L1 over L2 norm

  • The L1 norm prefers sparse coefficient vectors. (explanation on Quora) This means the L1 norm performs feature selection and you can delete all features where the coefficient is 0. A reduction of the dimensions is useful in almost all cases.
  • The L1 norm optimizes the median. Therefore the L1 norm is not sensitive to outliers.

More sources:

The same question on Quora

Another one

like image 150
Robin Spiess Avatar answered Oct 03 '22 17:10

Robin Spiess