Here is the R code
model1 <- glm(wt82_71 ~ qsmk + sex + race + poly(age, 2, raw = TRUE) + education + poly(smokeintensity, 2, raw = TRUE) + poly(smokeyrs, 2, raw = TRUE) + exercise + active + poly(wt71, 2, raw = TRUE) + qsmk:smokeintensity,data = nhefs)
In Python I wrote:
mod3 = smf.glm(formula='qsmk ~ sex + race + education + exercise + active + poly(age,2) + poly(smokeintensity,2) + poly(smokeyrs,2) + poly(wt71,2)', family=sm.families.Binomial(), data=nhefs).fit()
mod3.summary()
What would poly()
be in python?
Here is some comments
model 1: regression on covariates, allowing for some effect modification
notes:
(1) poly(x, 2) adds an orthogonal polynomial of degree 2, add the argument raw = TRUE if you want it to produce the same coefficients as would x + x^2
(2) x1*x2 enters the main effects of x1 and x2 and their product term x1:x2 enters just the product term (necessary here for smokeintensity because we want smokeintensity treated linearly in the interaction but quadratically in the main effect and thus a linear term for smokeintensity is not estimable)
(3) observations with missing values are automatically deleted
AFAIK, poly
for polynomial basis functions of continuous variables are not yet supported by patsy. The existing poly
is for ordered categorical variables.
Numpy has vander
functions for various polynomial bases that can be used directly in a formula.
Whether orthogonalizing on an existing dataset is useful or not is debatable. I prefer not to because then the basis functions do not change when the dataset changes.
https://github.com/pydata/patsy/issues/20 https://github.com/pydata/patsy/pull/92/files
As alternative, it is possible to directly specify power terms, see python stats models - quadratic term in regression but that will not orthogonalize.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With