Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

confidence and prediction intervals with StatsModels

I do this linear regression with StatsModels:

import numpy as np import statsmodels.api as sm from statsmodels.sandbox.regression.predstd import wls_prediction_std  n = 100  x = np.linspace(0, 10, n) e = np.random.normal(size=n) y = 1 + 0.5*x + 2*e X = sm.add_constant(x)  re = sm.OLS(y, X).fit() print(re.summary())  prstd, iv_l, iv_u = wls_prediction_std(re) 

My questions are, iv_l and iv_u are the upper and lower confidence intervals or prediction intervals?

How I get others?

I need the confidence and prediction intervals for all points, to do a plot.

like image 585
F.N.B Avatar asked Jul 09 '13 22:07

F.N.B


People also ask

What is the difference between confidence interval and prediction interval?

The prediction interval predicts in what range a future individual observation will fall, while a confidence interval shows the likely range of values associated with some statistical parameter of the data, such as the population mean.

How do you find the prediction interval in Python?

You can get the prediction intervals by using LRPI() class from the Ipython notebook in my repo (https://github.com/shahejokarian/regression-prediction-interval). You need to set the t value to get the desired confidence interval for the prediction values, otherwise the default is 95% conf. interval.

What is a 95% prediction interval?

A 95% prediction interval of 100 to 110 hours for the mean life of a battery tells you that future batteries produced will fall into that range 95% of the time. There is a 5% chance that a battery will not fall into this interval.


2 Answers

For test data you can try to use the following.

predictions = result.get_prediction(out_of_sample_df) predictions.summary_frame(alpha=0.05) 

I found the summary_frame() method buried here and you can find the get_prediction() method here. You can change the significance level of the confidence interval and prediction interval by modifying the "alpha" parameter.

I am posting this here because this was the first post that comes up when looking for a solution for confidence & prediction intervals – even though this concerns itself with test data rather.

Here's a function to take a model, new data, and an arbitrary quantile, using this approach:

def ols_quantile(m, X, q):   # m: OLS model.   # X: X matrix.   # q: Quantile.   #   # Set alpha based on q.   a = q * 2   if q > 0.5:     a = 2 * (1 - q)   predictions = m.get_prediction(X)   frame = predictions.summary_frame(alpha=a)   if q > 0.5:     return frame.obs_ci_upper   return frame.obs_ci_lower 
like image 88
Julius Avatar answered Sep 20 '22 13:09

Julius


update see the second answer which is more recent. Some of the models and results classes have now a get_prediction method that provides additional information including prediction intervals and/or confidence intervals for the predicted mean.

old answer:

iv_l and iv_u give you the limits of the prediction interval for each point.

Prediction interval is the confidence interval for an observation and includes the estimate of the error.

I think, confidence interval for the mean prediction is not yet available in statsmodels. (Actually, the confidence interval for the fitted values is hiding inside the summary_table of influence_outlier, but I need to verify this.)

Proper prediction methods for statsmodels are on the TODO list.

Addition

Confidence intervals are there for OLS but the access is a bit clumsy.

To be included after running your script:

from statsmodels.stats.outliers_influence import summary_table  st, data, ss2 = summary_table(re, alpha=0.05)  fittedvalues = data[:, 2] predict_mean_se  = data[:, 3] predict_mean_ci_low, predict_mean_ci_upp = data[:, 4:6].T predict_ci_low, predict_ci_upp = data[:, 6:8].T  # Check we got the right things print np.max(np.abs(re.fittedvalues - fittedvalues)) print np.max(np.abs(iv_l - predict_ci_low)) print np.max(np.abs(iv_u - predict_ci_upp))  plt.plot(x, y, 'o') plt.plot(x, fittedvalues, '-', lw=2) plt.plot(x, predict_ci_low, 'r--', lw=2) plt.plot(x, predict_ci_upp, 'r--', lw=2) plt.plot(x, predict_mean_ci_low, 'r--', lw=2) plt.plot(x, predict_mean_ci_upp, 'r--', lw=2) plt.show() 

enter image description here

This should give the same results as SAS, http://jpktd.blogspot.ca/2012/01/nice-thing-about-seeing-zeros.html

like image 45
Josef Avatar answered Sep 19 '22 13:09

Josef