Say I have a dataframe (let's call it DF
) where y
is the dependent variable and x1, x2, x3
are my independent variables. In R I can fit a linear model using the following code, and the .
will include all of my independent variables in the model:
# R code for fitting linear model
result = lm(y ~ ., data=DF)
I can't figure out how to do this with statsmodels using patsy formulas without explicitly adding all of my independent variables to the formula. Does patsy have an equivalent to R's .
? I haven't had any luck finding it in the patsy documentation.
This formula specifies a model with 2 independent variables: x1 and the sum of x1 and x2 .
A key difference between the two libraries is how they handle constants. Scikit-learn allows the user to specify whether or not to add a constant through a parameter, while statsmodels' OLS class has a function that adds a constant to a given array.
I haven't found .
equivalent in patsy documentation either. But what it lacks in conciseness, it can make-up for by giving strong string manipulation in Python. So, you can get formula involving all variable columns in DF
using
all_columns = "+".join(DF.columns - ["y"])
This gives x1+x2+x3
in your case. Finally, you can create a string formula using y
and pass it to any fitting procedure
my_formula = "y~" + all_columns
result = lm(formula=my_formula, data=DF)
No this doesn't exist in patsy yet, unfortunately. See this issue.
As this is still not included in patsy
, I wrote a small function that I call when I need to run statsmodels
models with all columns (optionally with exceptions)
def ols_formula(df, dependent_var, *excluded_cols):
'''
Generates the R style formula for statsmodels (patsy) given
the dataframe, dependent variable and optional excluded columns
as strings
'''
df_columns = list(df.columns.values)
df_columns.remove(dependent_var)
for col in excluded_cols:
df_columns.remove(col)
return dependent_var + ' ~ ' + ' + '.join(df_columns)
For example, for a dataframe called df
with columns y, x1, x2, x3
, running ols_formula(df, 'y', 'x3')
returns 'y ~ x1 + x2'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With