As I understand, <code>numpy.linalg.lstsq</code> and <code>sklearn.linear_model.LinearRegression</code> both look for solutions <code>x</code> of the linear system <code>Ax = y</code>, that minimise the resdidual sum <code>||Ax - y||</code>. But they don't give the same result: <pre class="prettyprint"><code>from sklearn import linear_model import numpy as np A = np.array([[1, 0], [0, 1]]) b = np.array([1, 0]) x , _, _, _ = np.linalg.lstsq(A,b) x Out[1]: array([ 1., 0.]) clf = linear_model.LinearRegression() clf.fit(A, b) coef = clf.coef_ coef Out[2]: array([ 0.5, -0.5]) </code></pre> What am I overlooking?

Both of them are implemented by LPACK gelsd. The difference is that <code>linear_model.LinearRegression</code> will do data pre-process (default) as below for input X (your A). But <code>np.linalg.lstsq</code> don't. You can refer to the source code of LinearRegression for more details about the data pre-process. <pre class="prettyprint"><code>X = (X - X_offset) / X_scale </code></pre> If you don't want the data pre-process, you should set <code>fit_intercept=False</code>. Briefly speaking, if you normalize your input before linear regression, you will get the same result by both <code>linear_model.LinearRegression</code> and <code>np.linalg.lstsq</code> as below. <pre class="prettyprint"><code># Normalization/Scaling from sklearn.preprocessing import StandardScaler A = np.array([[1, 0], [0, 1]]) X_scaler = StandardScaler() A = X_scaler.fit_transform(A) </code></pre> Now A is <code>array([[ 1., -1.],[-1., 1.]])</code> <pre class="prettyprint"><code>from sklearn import linear_model import numpy as np b = np.array([1, 0]) x , _, _, _ = np.linalg.lstsq(A,b) x Out[1]: array([ 0.25, -0.25]) clf = linear_model.LinearRegression() clf.fit(A, b) coef = clf.coef_ coef Out[2]: array([ 0.25, -0.25]) </code></pre>

Difference between numpy.linalg.lstsq and sklearn.linear_model.LinearRegression

Tags:

As I understand, numpy.linalg.lstsq and sklearn.linear_model.LinearRegression both look for solutions x of the linear system Ax = y, that minimise the resdidual sum ||Ax - y||.

But they don't give the same result:

from sklearn import linear_model
import numpy as np

A = np.array([[1, 0], [0, 1]])
b = np.array([1, 0])
x , _, _, _ = np.linalg.lstsq(A,b)
x

Out[1]: array([ 1.,  0.])

clf = linear_model.LinearRegression()
clf.fit(A, b)                              
coef = clf.coef_
coef

Out[2]: array([ 0.5, -0.5])

What am I overlooking?

413

asked Apr 12 '16 12:04

fhchl

1 Answers

Both of them are implemented by LPACK gelsd.

The difference is that linear_model.LinearRegression will do data pre-process (default) as below for input X (your A). But np.linalg.lstsq don't. You can refer to the source code of LinearRegression for more details about the data pre-process.

X = (X - X_offset) / X_scale

If you don't want the data pre-process, you should set fit_intercept=False.

Briefly speaking, if you normalize your input before linear regression, you will get the same result by both linear_model.LinearRegression and np.linalg.lstsq as below.

# Normalization/Scaling
from sklearn.preprocessing import StandardScaler
A = np.array([[1, 0], [0, 1]])
X_scaler = StandardScaler()
A = X_scaler.fit_transform(A)

Now A is array([[ 1., -1.],[-1., 1.]])

from sklearn import linear_model
import numpy as np

b = np.array([1, 0])
x , _, _, _ = np.linalg.lstsq(A,b)
x
Out[1]: array([ 0.25, -0.25])

clf = linear_model.LinearRegression()
clf.fit(A, b)                              
coef = clf.coef_
coef

Out[2]: array([ 0.25, -0.25])

157

answered Sep 28 '22 03:09

ybdesire

Related questions
                            
                                How to make from src directory in build directory?
                            
                                Error with primary key in one-to-one relationship using Entity Framework
                            
                                Working In Background Mode in XMPP
                            
                                ADAL.js with Multi-Tenant Azure Active Directory
                            
                                WAMP server doesn't work properly on Windows 10
                            
                                SonarLint ignores quality profile from server
                            
                                Refential constraint violation running multiples updates in the same transaction
                            
                                Why does the Installed Plugin SDKs not show up Under Liferay in the Liferay IDE Windows Preferences
                            
                                Get specific array value where another value is "1"?
                            
                                List<T>.AddRange / InsertRange creating temporary array
                            
                                Ng-hide not working in data list Angular
                            
                                Chrome becomes unresponsive for 30+ seconds after refreshing app (w/ dev tools open)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With