I am trying to do multiple variables linear regression. But I find that the sklearn.linear_model working very weird. Here's my code:
import numpy as np
from sklearn import linear_model
b = np.array([3,5,7]).transpose() ## the right answer I am expecting
x = np.array([[1,6,9], ## 1*3 + 6*5 + 7*9 = 96
[2,7,7], ## 2*3 + 7*5 + 7*7 = 90
[3,4,5]]) ## 3*3 + 4*5 + 5*7 = 64
y = np.array([96,90,64]).transpose()
clf = linear_model.LinearRegression()
clf.fit([[1,6,9],
[2,7,7],
[3,4,5]], [96,90,64])
print clf.coef_ ## <== it gives me [-2.2 5 4.4] NOT [3, 5, 7]
print np.dot(x, clf.coef_) ## <== it gives me [ 67.4 61.4 35.4]
In order to find your initial coefficients back you need to use the keyword fit_intercept=False
when construction the linear regression.
import numpy as np
from sklearn import linear_model
b = np.array([3,5,7])
x = np.array([[1,6,9],
[2,7,7],
[3,4,5]])
y = np.array([96,90,64])
clf = linear_model.LinearRegression(fit_intercept=False)
clf.fit(x, y)
print clf.coef_
print np.dot(x, clf.coef_)
Using fit_intercept=False
prevents the LinearRegression
object from working with x - x.mean(axis=0)
, which it would otherwise do (and capture the mean using a constant offset y = xb + c
) - or equivalently by adding a column of 1
to x
.
As a side remark, calling transpose
on a 1D array doesn't have any effect (it reverses the order of your axes, and you only have one).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With