Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

statsmodels add_constant for OLS intercept, what is this actually doing?

Reviewing linear regressions via statsmodels OLS fit I see you have to use add_constant to add a constant '1' to all your points in the independent variable(s) before fitting. However my only understanding of intercepts in this context would be the value of y for our line when our x equals 0, so I'm not clear what purpose always just injecting a '1' here serves. What is this constant actually telling the OLS fit?

like image 527
Tim Lindsey Avatar asked Dec 31 '16 02:12

Tim Lindsey


People also ask

Why do we add constant in statsmodels Add_constant?

First, we always need to add the constant. The reason for this is that it takes care of the bias in the data (a constant difference which is there for all observations). Your idea involves adding a column of ones to the X, so that you can avoid 'add_constant()' right?

What is Statsmodel OLS?

The OLS() function of the statsmodels. api module is used to perform OLS regression. It returns an OLS object. Then fit() method is called on this object for fitting the regression line to the data. The summary() method is used to obtain a table which gives an extensive description about the regression results.

What does SM Add_constant do in Python?

adds a column of ones to the x1 array ( data['SAT'] ).

Why do you add a constant to the train set using the SM Add_constant () command when you're fitting a line using statsmodels?

add_constant() command when you're fitting a line using statsmodels? statsmodels cannot fit a line through the data without this command.


2 Answers

sm.add_constant in statsmodel is the same as sklearn's fit_intercept parameter in LinearRegression(). If you don't do sm.add_constant or when LinearRegression(fit_intercept=False), then both statsmodels and sklearn algorithms assume that b=0 in y = mx + b, and it'll fit the model using b=0 instead of calculating what b is supposed to be based on your data.

like image 84
wi3o Avatar answered Nov 04 '22 08:11

wi3o


It doesn't add a constant to your values, it adds a constant term to the linear equation it is fitting. In the single-predictor case, it's the difference between fitting an a line y = mx to your data vs fitting y = mx + b.

like image 37
BrenBarn Avatar answered Nov 04 '22 08:11

BrenBarn