Normalization in sci-kit learn linear_models

Q: What is Reg Coef_?

The coef_ gives the coefficient of the features of your dataset.

Tags:

normalization

If the normalization parameter is set to True in any of the linear models in sklearn.linear_model, is normalization applied during the score step?

For example:

from sklearn import linear_model
from sklearn.datasets import load_boston

a = load_boston()

l = linear_model.ElasticNet(normalize=False)
l.fit(a["data"][:400], a["target"][:400])
print l.score(a["data"][400:], a["target"][400:])
# 0.24192774524694727

l = linear_model.ElasticNet(normalize=True)
l.fit(a["data"][:400], a["target"][:400])
print l.score(a["data"][400:], a["target"][400:])
# -2.6177006348389167

In this case we see a degradation in the prediction power when we set normalize=True, and I can't tell if this is simply an artifact of the score function not applying the normalization, or if the normalized values caused the model performance to drop.

914

asked Oct 20 '15 20:10

mgoldwasser

1 Answers

The normalization is indeed applied to both fit data and predict data. The reason you see such different results is that the range of the columns in the Boston House Price dataset varies widely:

>>> from sklearn.datasets import load_boston
>>> boston = load_boston()
>>> boston.data.std(0)
array([  8.58828355e+00,   2.32993957e+01,   6.85357058e+00,
         2.53742935e-01,   1.15763115e-01,   7.01922514e-01,
         2.81210326e+01,   2.10362836e+00,   8.69865112e+00,
         1.68370495e+02,   2.16280519e+00,   9.12046075e+01,
         7.13400164e+00])

This means that the regularization terms in the ElasticNet have a very different effect on normalized vs unnormalized data, and this is why the results differ. You can confirm this by setting the regularization strength (alpha) to a very small number, e.g. 1E-8. In this case, regularization has very little effect and the normalization no longer affects prediction results.

185

answered Oct 04 '22 21:10

jakevdp

Related questions
                            
                                Efficient & Pythonic way of finding all possible sublists of a list in given range and the minimum product after multipying all elements in them?
                            
                                Pandas: break categorical column to multiple columns
                            
                                Timezone offset sign reversed by dateutil?
                            
                                sqlAlchemy does not recognise changes to DB made outside of session
                            
                                Sum of float numbers in a list in Python
                            
                                PyCharm: multiple projects in same window, independent version control
                            
                                A list as a key for PySpark's reduceByKey
                            
                                working of \n in python [duplicate]
                            
                                How to stop Python Kafka Consumer in program?
                            
                                How do I merge two lists of tuples based on a key?
                            
                                Pyperclip is giving an error
                            
                                Solution works for sample data but online judge gives errors?
                            
                                why does python logging level in basicConfig have no effect?
                            
                                `argparse` multiple choice argument?
                            
                                ipython: Notebook does not appear to be JSON
                            
                                ImportError with Pyinstaller and Pandas
                            
                                Launch a Python Script from a sqlite3 Trigger
                            
                                Memory leak in Google ndb library
                            
                                Use AWS lambda function to convert S3 file from zip to gzip using boto3 python
                            
                                newline and dash not working correctly in jinja

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Normalization in sci-kit learn linear_models

Tags:

python

scikit-learn

linear-regression

normalization

mgoldwasser

People also ask

1 Answers

jakevdp

Recent Activity

Donate For Us