Please forgive my ignorance. All I'm trying to do is add a squared term to my regression without going through the trouble of defining a new column in my dataframe. I'm using statsmodels.formula.api (as stats) because the format is similar to R, which I am more familiar with.
hours_model = stats.ols(formula='act_hours ~ h_hours + C(month) + trend', data = df).fit()
The above works as expected.
hours_model = stats.ols(formula='act_hours ~ h_hours + h_hours**2 + C(month) + trend', data = df).fit()
This omits h_hours**2 and returns the same output as the line above.
I've also tried: h_hours^2, math.pow(h_hours,2), and poly(h_hours,2) All throw errors.
Any help would be appreciated.
You can try using I()
like in R:
import statsmodels.formula.api as smf
np.random.seed(0)
df = pd.DataFrame({'act_hours':np.random.uniform(1,4,100),'h_hours':np.random.uniform(1,4,100),
'month':np.random.randint(0,3,100),'trend':np.random.uniform(0,2,100)})
model = 'act_hours ~ h_hours + I(h_hours**2)'
hours_model = smf.ols(formula = model, data = df)
hours_model.exog[:5,]
array([[ 1. , 3.03344961, 9.20181654],
[ 1. , 1.81002392, 3.27618659],
[ 1. , 3.20558207, 10.27575638],
[ 1. , 3.88656564, 15.10539244],
[ 1. , 1.74625943, 3.049422 ]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With