Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Polynomial Regression Using statsmodels.formula.api

Please forgive my ignorance. All I'm trying to do is add a squared term to my regression without going through the trouble of defining a new column in my dataframe. I'm using statsmodels.formula.api (as stats) because the format is similar to R, which I am more familiar with.

hours_model = stats.ols(formula='act_hours ~ h_hours + C(month) + trend', data = df).fit()

The above works as expected.

hours_model = stats.ols(formula='act_hours ~ h_hours + h_hours**2 + C(month) + trend', data = df).fit()

This omits h_hours**2 and returns the same output as the line above.

I've also tried: h_hours^2, math.pow(h_hours,2), and poly(h_hours,2) All throw errors.

Any help would be appreciated.

like image 852
Matthew Withrow Avatar asked Dec 14 '22 08:12

Matthew Withrow


1 Answers

You can try using I() like in R:

import statsmodels.formula.api as smf

np.random.seed(0)

df = pd.DataFrame({'act_hours':np.random.uniform(1,4,100),'h_hours':np.random.uniform(1,4,100),
                  'month':np.random.randint(0,3,100),'trend':np.random.uniform(0,2,100)})

model = 'act_hours ~ h_hours + I(h_hours**2)'
hours_model = smf.ols(formula = model, data = df)

hours_model.exog[:5,]

array([[ 1.        ,  3.03344961,  9.20181654],
       [ 1.        ,  1.81002392,  3.27618659],
       [ 1.        ,  3.20558207, 10.27575638],
       [ 1.        ,  3.88656564, 15.10539244],
       [ 1.        ,  1.74625943,  3.049422  ]])
like image 88
StupidWolf Avatar answered Dec 21 '22 22:12

StupidWolf