Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bootstrapping multiple regression parameters in Python

I'm trying to use bootstraping to estimate multiple regression coefficients in Python, and I can't figure out how to implement it.

I use statsmodels.ols(formula = 'Y ~ A * B * C, ... ) to run a single model. How can I implement a bootstrap that will return estimates and confidence intervals for all of the parameters returned by this ordinary least squares model?

I see there's potentially a bootstrap method in statsmodels, but I can't figure out how to import it and if it has the functionality I want. There's another one (or a few) in scikits, but again, I can't figure out how to use these to estimate the many returned regression parameters.

Thanks for your help. I'm completely stumped -- and fairly new to Python.

like image 517
Kara Avatar asked Oct 24 '25 05:10

Kara


1 Answers

You can use the resample package that can be downloaded via pip. Here's the Github page: https://github.com/dsaxton/resample.

In the doc folder there's a notebook that contains an example for precisely this kind of problem (here we're using sklearn but it can be adapted for statsmodels as well). Essentially you define your modeling procedure as a function on the full data set (including both predictors and the response variable) which returns the model parameters in whatever format you like (here we're returning a dictionary with the coefficients and intercept) and then recompute the function on bootstrap samples using bootstrap from the resample.bootstrap module (df is a pandas DataFrame containing the predictors and y a Series with the response variable):

from resample.bootstrap import bootstrap
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression

def fitreg(A):
    scale = StandardScaler()
    reg = LinearRegression(fit_intercept=True)
    X_scale = scale.fit_transform(A[:, :A.shape[1]-2])
    y = A[:, A.shape[1]-1]
    reg.fit(X_scale, y)
    return {"coef": reg.coef_, "intercept": reg.intercept_}

boot_coef = bootstrap(a=df.join(y).values, f=fitreg, b=5000)
like image 87
dsaxton Avatar answered Oct 26 '25 19:10

dsaxton



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!