When I study Python SKlearn, the first example that I come across is Generalized Linear Models.
Code of its very first example:
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit([[0, 0], [1, 1], [2,2]], [0, 1,2])
reg.fit
reg.coef_
array([ 0.5, 0.5])
Here I assume [[0, 0], [1, 1], [2,2]]
represents a data.frame containing x1 = c(0,1,2)
and x2 = c(0,1,2)
and y = c(0,1,2)
as well.
Immediately, I begin to think that array([ 0.5, 0.5])
are the coeffs for x1
and x2
.
But, are there standard errors for those estimates? What about t tests p values, R2 and other figures?
Then I try to do the same thing in R.
X = data.frame(x1 = c(0,1,2),x2 = c(0,1,2),y = c(0,1,2))
lm(data=X, y~x1+x2)
Call:
lm(formula = y ~ x1 + x2, data = X)
#Coefficients:
#(Intercept) x1 x2
# 1.282e-16 1.000e+00 NA
Obviously x1
and x2
are completely linearly dependent so the OLS will fail. Why the SKlearn still works and gives this results? Am I getting sklearn in a wrong way? Thanks.
Both solutions are correct (assuming that NA behaves like a zero). Which solution is favored depends on the numerical solver used by the OLS estimator.
sklearn.linear_model.LinearRegression
is based on scipy.linalg.lstsq
which in turn calls the LAPACK gelsd
routine which is described here:
http://www.netlib.org/lapack/lug/node27.html
In particular it says that when the problem is rank deficient it seeks the minimum norm least squares solution.
If you want to favor the other solution, you can use a coordinate descent solver with a tiny bit of L1 penalty as implemented in th Lasso class:
>>> from sklearn.linear_model import Lasso
>>> reg = Lasso(alpha=1e-8)
>>> reg.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2])
Lasso(alpha=1e-08, copy_X=True, fit_intercept=True, max_iter=1000,
normalize=False, positive=False, precompute=False, random_state=None,
selection='cyclic', tol=0.0001, warm_start=False)
>>> reg.coef_
array([ 9.99999985e-01, 3.97204719e-17])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With