Linear Regression :: Normalization (Vs) Standardization

Tags:

I am using Linear regression to predict data. But, I am getting totally contrasting results when I Normalize (Vs) Standardize variables.

Normalization = x -xmin/ xmax – xmin Zero Score Standardization = x - xmean/ xstd

a) Also, when to Normalize (Vs) Standardize ? b) How Normalization affects Linear Regression? c) Is it okay if I don't normalize all the attributes/lables in the linear regression?

Thanks, Santosh

371

asked Aug 20 '15 01:08

Santosh Kumar

2 Answers

Note that the results might not necessarily be so different. You might simply need different hyperparameters for the two options to give similar results.

The ideal thing is to test what works best for your problem. If you can't afford this for some reason, most algorithms will probably benefit from standardization more so than from normalization.

See here for some examples of when one should be preferred over the other:

For example, in clustering analyses, standardization may be especially crucial in order to compare similarities between features based on certain distance measures. Another prominent example is the Principal Component Analysis, where we usually prefer standardization over Min-Max scaling, since we are interested in the components that maximize the variance (depending on the question and if the PCA computes the components via the correlation matrix instead of the covariance matrix; but more about PCA in my previous article).

However, this doesn’t mean that Min-Max scaling is not useful at all! A popular application is image processing, where pixel intensities have to be normalized to fit within a certain range (i.e., 0 to 255 for the RGB color range). Also, typical neural network algorithm require data that on a 0-1 scale.

One disadvantage of normalization over standardization is that it loses some information in the data, especially about outliers.

Also on the linked page, there is this picture:

Plots of a standardized and normalized data set

As you can see, scaling clusters all the data very close together, which may not be what you want. It might cause algorithms such as gradient descent to take longer to converge to the same solution they would on a standardized data set, or it might even make it impossible.

"Normalizing variables" doesn't really make sense. The correct terminology is "normalizing / scaling the features". If you're going to normalize or scale one feature, you should do the same for the rest.

118

answered Sep 21 '22 21:09

IVlad

That makes sense because normalization and standardization do different things.

Normalization transforms your data into a range between 0 and 1

Standardization transforms your data such that the resulting distribution has a mean of 0 and a standard deviation of 1

Normalization/standardization are designed to achieve a similar goal, which is to create features that have similar ranges to each other. We want that so we can be sure we are capturing the true information in a feature, and that we dont over weigh a particular feature just because its values are much larger than other features.

If all of your features are within a similar range of each other then theres no real need to standardize/normalize. If, however, some features naturally take on values that are much larger/smaller than others then normalization/standardization is called for

If you're going to be normalizing at least one variable/feature, I would do the same thing to all of the others as well

answered Sep 19 '22 21:09

Simon

Related questions
                            
                                Controlling the threshold in Logistic Regression in Scikit Learn
                            
                                Fastest SVM implementation usable in Python [closed]
                            
                                Python NLTK pos_tag not returning the correct part-of-speech tag
                            
                                Why is my GPU slower than CPU when training LSTM/RNN models?
                            
                                Missing values in scikits machine learning
                            
                                How would one use Kernel Density Estimation as a 1D clustering method in scikit learn?
                            
                                Getting TypeError: '(slice(None, None, None), 0)' is an invalid key
                            
                                Altering trained images to train neural network
                            
                                How to make virtual organisms learn using neural networks? [closed]
                            
                                Feature selection using scikit-learn
                            
                                sklearn metrics for multiclass classification
                            
                                Fitting data vs. transforming data in scikit-learn
                            
                                How to calculate optimal batch size
                            
                                What is the difference between Q-learning and Value Iteration?
                            
                                Comparing R to Matlab for Data Mining
                            
                                SVM and Neural Network
                            
                                Differences in SciKit Learn, Keras, or Pytorch [closed]
                            
                                Why rotation-invariant neural networks are not used in winners of the popular competitions?
                            
                                Machine Learning : Tensorflow v/s Tensorflow.js v/s Brain.js [closed]
                            
                                How to understand loss acc val_loss val_acc in Keras model fitting

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Linear Regression :: Normalization (Vs) Standardization

Tags:

machine-learning

linear-regression

feature-extraction

Santosh Kumar

People also ask

2 Answers

IVlad

Simon

Recent Activity

Donate For Us