Features are usually normalized prior to classification. L1 and L2 normalization are usually used in the literature. Could anybody comment on the advantages of L2 norm (or L1 norm) compared to L1 norm (or L2 norm)?

Advantages of L2 over L1 norm <ul> <li>As already stated by aleju in the comments, derivations of the L2 norm are easily computed. Therefore it is also easy to use gradient based learning methods. </li> <li>L2 regularization optimizes the mean cost (whereas L1 reduces the median explanation) which is often used as a performance measurement. This is especially good if you know you don't have any outliers and you want to keep the overall error small.</li> <li>The solution is more likely to be unique. This ties in with the previous point: While the mean is a single value, the median might be located in an interval between two points and is therefore not unique.</li> <li>While L1 regularization can give you a sparse coefficient vector, the non-sparseness of L2 can improve your prediction performance (since you leverage more features instead of simply ignoring them).</li> <li>L2 is invariant under rotation. If you have a dataset consisting of points in a space and you apply a rotation, you still get the same results (i.e. the distances between points remain the same).</li> </ul> Advantages of L1 over L2 norm <ul> <li>The L1 norm prefers sparse coefficient vectors. (explanation on Quora) This means the L1 norm performs feature selection and you can delete all features where the coefficient is 0. A reduction of the dimensions is useful in almost all cases.</li> <li>The L1 norm optimizes the median. Therefore the L1 norm is not sensitive to outliers.</li> </ul> More sources: The same question on Quora Another one

feature normalization- advantage of l2 normalization

1 Answers

Advantages of L2 over L1 norm

As already stated by aleju in the comments, derivations of the L2 norm are easily computed. Therefore it is also easy to use gradient based learning methods.
L2 regularization optimizes the mean cost (whereas L1 reduces the median explanation) which is often used as a performance measurement. This is especially good if you know you don't have any outliers and you want to keep the overall error small.
The solution is more likely to be unique. This ties in with the previous point: While the mean is a single value, the median might be located in an interval between two points and is therefore not unique.
While L1 regularization can give you a sparse coefficient vector, the non-sparseness of L2 can improve your prediction performance (since you leverage more features instead of simply ignoring them).
L2 is invariant under rotation. If you have a dataset consisting of points in a space and you apply a rotation, you still get the same results (i.e. the distances between points remain the same).

Advantages of L1 over L2 norm

The L1 norm prefers sparse coefficient vectors. (explanation on Quora) This means the L1 norm performs feature selection and you can delete all features where the coefficient is 0. A reduction of the dimensions is useful in almost all cases.
The L1 norm optimizes the median. Therefore the L1 norm is not sensitive to outliers.

More sources:

The same question on Quora

Another one

150

answered Oct 03 '22 17:10

Robin Spiess

Related questions
                            
                                Why does TensorFlow always use GPU 0?
                            
                                HBase & Mahout - Using HBase as a Datastore/source for Mahout - Classification
                            
                                What does the copy_initial_weights documentation mean in the higher library for Pytorch?
                            
                                Anomaly detection using Python [closed]
                            
                                Classifiers confidence in opencv face detector
                            
                                C++ Reinforcement Learning Library [closed]
                            
                                How to update Spark MatrixFactorizationModel for ALS
                            
                                How to tune GaussianNB?
                            
                                Computational Complexity of Self-Attention in the Transformer Model
                            
                                How to extract unsupervised clusters from a Dirichlet Process in PyMC3?
                            
                                Why use a restricted Boltzmann machine rather than a multi-layer perceptron?
                            
                                How do I set up TensorFlow in the Google cloud?
                            
                                Get weight matrices from gensim word2Vec
                            
                                How to compare ROC AUC scores of different binary classifiers and assess statistical significance in Python? (p-value, confidence interval)
                            
                                Tensorflow: save the model with smallest validation error
                            
                                How to implement multi-class semantic segmentation?
                            
                                BertForSequenceClassification vs. BertForMultipleChoice for sentence multi-class classification
                            
                                Can stop-words be found automatically?
                            
                                Return number of epochs for EarlyStopping callback in Keras
                            
                                Concatenate custom features with CountVectorizer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

feature normalization- advantage of l2 normalization

Tags:

machine-learning

computer-vision

user570593

People also ask

1 Answers

Robin Spiess

Recent Activity

Donate For Us