Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why would R-Squared decrease when I add an exogenous variable in OLS using python statsmodels

If I understand the OLS model correctly, this should never be the case?

trades['const']=1
Y = trades['ret']+trades['comms']
#X = trades[['potential', 'pVal', 'startVal', 'const']]
X = trades[['potential', 'pVal', 'startVal']]

from statsmodels.regression.linear_model import OLS
ols=OLS(Y, X)
res=ols.fit()
res.summary()

If I turn the const on, I get a rsquared of 0.22 and with it off, I get 0.43. How is that even possible?

like image 269
RAY Avatar asked Apr 16 '15 02:04

RAY


1 Answers

see the answer here Statsmodels: Calculate fitted values and R squared

Rsquared follows a different definition depending on whether there is a constant in the model or not.

Rsquared in a linear model with a constant is the standard definition that uses a comparison with a mean only model as reference. Total sum of squares is demeaned.

Rsquared in a linear model without a constant compares with a model that has no regressors at all, or the effect of the constant is zero. In this case the R squared calculation uses a total sum of squares that does not demean.

Since the definition changes if we add or drop a constant, the R squared can go either way. The actual explained sum of squares will always increase if we add additional explanatory variables, or stay unchanged if the new variable doesn't contribute anything,

like image 157
Josef Avatar answered Sep 29 '22 14:09

Josef