Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do I need to add a constant when using sm.OLS?

I am performing an OLS on two sets of data Y and X. I use statsmodel.api.OLS. However I found some very different results whether I add a constant to X before or not. Here is the code:

import statsmodels.api as sm
import numpy as np

mess = "SELECT .... FROM... WHERE ...."
data = np.array(db.extractData(mess))
Y = data[,:0]
X = data[,:1]
#Option1 
res = sm.OLS(Y,X).fit().rsquared ---> will return 0.76
#Option2
X = sm.add_constant(X)
res = sm.OLS(Y,X).fit().rsquared ---> will return 0.06

Considering the massive difference whether or not I add the constant, I assume that I am doing something wrong. Thanks very much for your time.

like image 866
Dirty_Fox Avatar asked May 17 '15 10:05

Dirty_Fox


People also ask

Do I need to add constant statsmodels?

First, we always need to add the constant. The reason for this is that it takes care of the bias in the data (a constant difference which is there for all observations).

Why do you add a constant to the train set using the SM add_constant () command when you're fitting a line using statsmodels?

add_constant() command when you're fitting a line using statsmodels? statsmodels cannot fit a line through the data without this command.

What does SM add_constant () do?

Add a column of ones to an array.


1 Answers

You need to add the constant. from the documentation:http://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLS.html

An intercept is not included by default and should be added by the user. See statsmodels.tools.add_constant.

like image 199
anc1revv Avatar answered Oct 24 '22 07:10

anc1revv