Ridge Regression: Scikit-learn vs. direct calculation does not match for alpha > 0

Tags:

In Ridge Regression, we are solving Ax=b with L2 Regularization. The direct calculation is given by:

x = (A^TA + alpha * I)^-1A^Tb

I have looked at the scikit-learn code and they do implement the same calculation. But, I can't seem to get the same results for alpha > 0

The minimal code to reproduce this.

import numpy as np
A = np.asmatrix(np.c_[np.ones((10,1)),np.random.rand(10,3)])
b = np.asmatrix(np.random.rand(10,1))
I = np.identity(A.shape[1])
alpha = 1
x = np.linalg.inv(A.T*A + alpha * I)*A.T*b
print(x.T)
>>> [[ 0.37371021  0.19558433  0.06065241  0.17030177]]

from sklearn.linear_model import Ridge
model = Ridge(alpha = alpha).fit(A[:,1:],b)
print(np.c_[model.intercept_, model.coef_])
>>> [[ 0.61241566  0.02727579 -0.06363385  0.05303027]]

Any suggestions on what I can do to resolve this discrepancy?

408

asked Jul 25 '16 08:07

amitkaps

1 Answers

This modification seems to yield the same result for the direct version and the numpy version:

import numpy as np
A = np.asmatrix(np.random.rand(10,3))
b = np.asmatrix(np.random.rand(10,1))
I = np.identity(A.shape[1])
alpha = 1
x = np.linalg.inv(A.T*A + alpha * I)*A.T*b
print (x.T)


from sklearn.linear_model import Ridge
model = Ridge(alpha = alpha, tol=0.1, fit_intercept=False).fit(A ,b)

print model.coef_
print model.intercept_

It seems the main reason for the difference is the class Ridge has the parameter fit_intercept=True (by inheritance from class _BaseRidge) (source)

This is applying a data centering procedure before passing the matrices to the _solve_cholesky function.

Here's the line in ridge.py that does it

        X, y, X_mean, y_mean, X_std = self._center_data(
        X, y, self.fit_intercept, self.normalize, self.copy_X,
        sample_weight=sample_weight)

Also, it seems you were trying to implicitly account for the intercept by adding the column of 1's. As you see, this is not necessary if you specify fit_intercept=False

Appendix: Does the Ridge class actually implement the direct formula?

It depends on the choice of the solverparameter.

Effectively, if you do not specify the solverparameter in Ridge, it takes by default solver='auto' (which internally resorts to solver='cholesky'). This should be equivalent to the direct computation.

Rigorously, _solve_cholesky uses numpy.linalg.solve instead of numpy.inv. But it can be easily checked that

np.linalg.solve(A.T*A + alpha * I, A.T*b)

yields the same as

np.linalg.inv(A.T*A + alpha * I)*A.T*b

answered Oct 20 '22 21:10

JARS

Related questions
                            
                                How to read excel cell and retain or detect its format in Python
                            
                                Resetting paused scrape, Scrapy
                            
                                Cython build can't find C++11 STL files - but only when called from setup.py
                            
                                OpenCV affine transformation won't perform
                            
                                How do I get TensorFlow's 'import_graph_def' to return Tensors
                            
                                element-wise operations of matrix in python
                            
                                How to use pelican to generate a hierarchical website, not a blog
                            
                                error :document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
                            
                                Finding all repeated substrings in a string and how often they appear
                            
                                are there any limitations on the number of locks a python program can create?
                            
                                How to hstack several sparse matrices (feature matrices)?
                            
                                overlapping dates between two date ranges in python
                            
                                using an numpy array as indices of the 2nd dim of another array? [duplicate]
                            
                                How to run async process from handler in aiohttp
                            
                                Barplot/line plot on same plot, but different axis and line plot in front of barplot
                            
                                Plotly python how to draw unbounded lines and spans?
                            
                                Make two Frames occupy 50% of the available width each?
                            
                                Using alembic with multiple databases
                            
                                Dot product of two sparse matrices affecting zero values only
                            
                                use existing field as _id using elasticsearch dsl python DocType

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Ridge Regression: Scikit-learn vs. direct calculation does not match for alpha > 0

Tags:

python

scikit-learn

linear-regression

amitkaps

People also ask

1 Answers

JARS

Recent Activity

Donate For Us