Here is what I am doing:
$ python Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin >>> import statsmodels.api as sm >>> statsmodels.__version__ '0.5.0' >>> import numpy >>> y = numpy.array([1,2,3,4,5,6,7,8,9]) >>> X = numpy.array([1,1,2,2,3,3,4,4,5]) >>> res_ols = sm.OLS(y, X).fit() >>> res_ols.params array([ 1.82352941])
I had expected an array with two elements?!? The intercept and the slope coefficient?
The OLS() function of the statsmodels. api module is used to perform OLS regression. It returns an OLS object. Then fit() method is called on this object for fitting the regression line to the data.
This is a very good question. First, we always need to add the constant. The reason for this is that it takes care of the bias in the data (a constant difference which is there for all observations).
The Ordinary Least Squares (OLS) regression technique falls under the Supervised Learning. It is a method for estimating the unknown parameters by creating a model which will minimize the sum of the squared errors between the observed data and the predicted one.
Try this:
X = sm.add_constant(X) sm.OLS(y,X)
as in the documentation:
An intercept is not included by default and should be added by the user
statsmodels.tools.tools.add_constant
Just to be complete, this works:
>>> import numpy >>> import statsmodels.api as sm >>> y = numpy.array([1,2,3,4,5,6,7,8,9]) >>> X = numpy.array([1,1,2,2,3,3,4,4,5]) >>> X = sm.add_constant(X) >>> res_ols = sm.OLS(y, X).fit() >>> res_ols.params array([-0.35714286, 1.92857143])
It does give me a different slope coefficient, but I guess that figures as we now do have an intercept.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With