Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between numpy.linalg.lstsq and sklearn.linear_model.LinearRegression

Tags:

As I understand, numpy.linalg.lstsq and sklearn.linear_model.LinearRegression both look for solutions x of the linear system Ax = y, that minimise the resdidual sum ||Ax - y||.

But they don't give the same result:

from sklearn import linear_model
import numpy as np

A = np.array([[1, 0], [0, 1]])
b = np.array([1, 0])
x , _, _, _ = np.linalg.lstsq(A,b)
x

Out[1]: array([ 1.,  0.])

clf = linear_model.LinearRegression()
clf.fit(A, b)                              
coef = clf.coef_
coef

Out[2]: array([ 0.5, -0.5])

What am I overlooking?

like image 413
fhchl Avatar asked Apr 12 '16 12:04

fhchl


People also ask

What's the difference between using Sklearn LinearRegression and the Statsmodel?

A key difference between the two libraries is how they handle constants. Scikit-learn allows the user to specify whether or not to add a constant through a parameter, while statsmodels' OLS class has a function that adds a constant to a given array.

What does Linear_model LinearRegression () do?

LinearRegression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation. Whether to calculate the intercept for this model.

What is Linalg Lstsq?

linalg. lstsq(a, b, rcond='warn')[source] Return the least-squares solution to a linear matrix equation. Computes the vector x that approximately solves the equation a @ x = b .

What does Sklearn Linear_model do?

linear_model is a class of the sklearn module if contain different functions for performing machine learning with linear models. The term linear model implies that the model is specified as a linear combination of features.


1 Answers

Both of them are implemented by LPACK gelsd.

The difference is that linear_model.LinearRegression will do data pre-process (default) as below for input X (your A). But np.linalg.lstsq don't. You can refer to the source code of LinearRegression for more details about the data pre-process.

X = (X - X_offset) / X_scale

If you don't want the data pre-process, you should set fit_intercept=False.

Briefly speaking, if you normalize your input before linear regression, you will get the same result by both linear_model.LinearRegression and np.linalg.lstsq as below.

# Normalization/Scaling
from sklearn.preprocessing import StandardScaler
A = np.array([[1, 0], [0, 1]])
X_scaler = StandardScaler()
A = X_scaler.fit_transform(A)

Now A is array([[ 1., -1.],[-1., 1.]])

from sklearn import linear_model
import numpy as np

b = np.array([1, 0])
x , _, _, _ = np.linalg.lstsq(A,b)
x
Out[1]: array([ 0.25, -0.25])

clf = linear_model.LinearRegression()
clf.fit(A, b)                              
coef = clf.coef_
coef

Out[2]: array([ 0.25, -0.25])
like image 157
ybdesire Avatar answered Sep 28 '22 03:09

ybdesire