I am performing an OLS on two sets of data Y and X. I use statsmodel.api.OLS. However I found some very different results whether I add a constant to X before or not. Here is the code:
import statsmodels.api as sm
import numpy as np
mess = "SELECT .... FROM... WHERE ...."
data = np.array(db.extractData(mess))
Y = data[,:0]
X = data[,:1]
#Option1
res = sm.OLS(Y,X).fit().rsquared ---> will return 0.76
#Option2
X = sm.add_constant(X)
res = sm.OLS(Y,X).fit().rsquared ---> will return 0.06
Considering the massive difference whether or not I add the constant, I assume that I am doing something wrong. Thanks very much for your time.
First, we always need to add the constant. The reason for this is that it takes care of the bias in the data (a constant difference which is there for all observations).
add_constant() command when you're fitting a line using statsmodels? statsmodels cannot fit a line through the data without this command.
Add a column of ones to an array.
You need to add the constant. from the documentation:http://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLS.html
An intercept is not included by default and should be added by the user. See statsmodels.tools.add_constant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With