Return std and confidence intervals for out-of-sample prediction in StatsModels

Tags:

I'd like to find the standard deviation and confidence intervals for an out-of-sample prediction from an OLS model.

This question is similar to Confidence intervals for model prediction, but with an explicit focus on using out-of-sample data.

The idea would be for a function along the lines of wls_prediction_std(lm, data_to_use_for_prediction=out_of_sample_df), that returns the prstd, iv_l, iv_u for that out of sample dataframe.

For instance:

import pandas as pd
import random
import statsmodels.formula.api as smf
from statsmodels.sandbox.regression.predstd import wls_prediction_std

df = pd.DataFrame({"y":[x for x in range(10)],
                   "x1":[(x*5 + random.random() * 2) for x in range(10)],
                    "x2":[(x*2.1 + random.random()) for x in range(10)]})

out_of_sample_df = pd.DataFrame({"x1":[(x*3 + random.random() * 2) for x in range(10)],
                                 "x2":[(x + random.random()) for x in range(10)]})

formula_string = "y ~ x1 + x2"
lm = smf.ols(formula=formula_string, data=df).fit()

# Prediction works fine:
print(lm.predict(out_of_sample_df))

# I can also get std and CI for in-sample data:
prstd, iv_l, iv_u = wls_prediction_std(lm)
print(prstd)

# I cannot figure out how to get std and CI for out-of-sample data:
try:
    print(wls_prediction_std(lm, exog= out_of_sample_df))
except ValueError as e:
    print(str(e))
    #returns "ValueError: wrong shape of exog"

# trying to concatenate the DFs:
df_both = pd.concat([df, out_of_sample_df],
                    ignore_index = True)

# Only returns results for the data from df, not from out_of_sample_df
lm2 = smf.ols(formula=formula_string, data=df_both).fit()
prstd2, iv_l2, iv_u2 = wls_prediction_std(lm2)
print(prstd2)

963

asked Sep 15 '15 18:09

2 Answers

It looks like the problem is in the format of the exog parameter. This method is 100% stolen from this workaround by github user thatneat. It is necessary because of this bug.

def transform_exog_to_model(fit, exog):
    transform=True
    self=fit

    # The following is lifted straight from statsmodels.base.model.Results.predict()
    if transform and hasattr(self.model, 'formula') and exog is not None:
        from patsy import dmatrix
        exog = dmatrix(self.model.data.orig_exog.design_info.builder,
                       exog)

    if exog is not None:
        exog = np.asarray(exog)
        if exog.ndim == 1 and (self.model.exog.ndim == 1 or
                               self.model.exog.shape[1] == 1):
            exog = exog[:, None]
        exog = np.atleast_2d(exog)  # needed in count model shape[1]

    # end lifted code
    return exog

transformed_exog = transform_exog_to_model(lm, out_of_sample_df)
print(transformed_exog)
prstd2, iv_l2, iv_u2 = wls_prediction_std(lm, transformed_exog, weights=[1])
print(prstd2)

122

answered Sep 20 '22 06:09

canary_in_the_data_mine

Additionally you can try to use the get_prediction method.

predictions = result.get_prediction(out_of_sample_df)
predictions.summary_frame(alpha=0.05)

This returns the confidence and prediction interval. I found the summary_frame() method buried here and you can find the get_prediction() method here. You can change the significance level of the confidence interval and prediction interval by modifying the "alpha" parameter.

answered Sep 22 '22 06:09

Julius

Related questions
                            
                                Python multiprocessing (joblib) best way for argument passing
                            
                                Read a tab separated file with first column as key and the rest as values
                            
                                How can memoized functions be tested?
                            
                                Import only functions from a python file
                            
                                how to delete text to end of line with curses
                            
                                Django model u'id' clashes when using OneToOneField
                            
                                How to reverse query objects for multiple levels in django?
                            
                                Break up Random forest classification fit into pieces in python?
                            
                                Python Django PDFKIT - [Errno 9] Bad file descriptor
                            
                                Perl's correspondent string literal for Python's prefix r"text"?
                            
                                SRGB-aware image resize in Pillow
                            
                                Reply to email using python 3.4
                            
                                Where do prints go when running Flask with Apache?
                            
                                Why don't cython compile logic or to `||` expression?
                            
                                How to make "Copy to clipboard" button/link in django admin for selected field?
                            
                                How to trigger Python script on Raspberry Pi from Node-Red
                            
                                Python Scipy: scipy.stats.spearmanr returning nans
                            
                                Uninstall and re-install pip package from python module
                            
                                How to connect to remote machine via WinRM in Python (pywinrm) using domain account?
                            
                                Select batch of rows sqlalchemy mysql

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Return std and confidence intervals for out-of-sample prediction in StatsModels

Tags:

python

linear-regression

statsmodels

confidence-interval

standard-deviation

canary_in_the_data_mine

People also ask

2 Answers

canary_in_the_data_mine

Julius

Recent Activity

Donate For Us