I do this linear regression
with StatsModels
:
import numpy as np import statsmodels.api as sm from statsmodels.sandbox.regression.predstd import wls_prediction_std n = 100 x = np.linspace(0, 10, n) e = np.random.normal(size=n) y = 1 + 0.5*x + 2*e X = sm.add_constant(x) re = sm.OLS(y, X).fit() print(re.summary()) prstd, iv_l, iv_u = wls_prediction_std(re)
My questions are, iv_l
and iv_u
are the upper and lower confidence intervals or prediction intervals?
How I get others?
I need the confidence and prediction intervals for all points, to do a plot.
The prediction interval predicts in what range a future individual observation will fall, while a confidence interval shows the likely range of values associated with some statistical parameter of the data, such as the population mean.
You can get the prediction intervals by using LRPI() class from the Ipython notebook in my repo (https://github.com/shahejokarian/regression-prediction-interval). You need to set the t value to get the desired confidence interval for the prediction values, otherwise the default is 95% conf. interval.
A 95% prediction interval of 100 to 110 hours for the mean life of a battery tells you that future batteries produced will fall into that range 95% of the time. There is a 5% chance that a battery will not fall into this interval.
For test data you can try to use the following.
predictions = result.get_prediction(out_of_sample_df) predictions.summary_frame(alpha=0.05)
I found the summary_frame() method buried here and you can find the get_prediction() method here. You can change the significance level of the confidence interval and prediction interval by modifying the "alpha" parameter.
I am posting this here because this was the first post that comes up when looking for a solution for confidence & prediction intervals – even though this concerns itself with test data rather.
Here's a function to take a model, new data, and an arbitrary quantile, using this approach:
def ols_quantile(m, X, q): # m: OLS model. # X: X matrix. # q: Quantile. # # Set alpha based on q. a = q * 2 if q > 0.5: a = 2 * (1 - q) predictions = m.get_prediction(X) frame = predictions.summary_frame(alpha=a) if q > 0.5: return frame.obs_ci_upper return frame.obs_ci_lower
update see the second answer which is more recent. Some of the models and results classes have now a get_prediction
method that provides additional information including prediction intervals and/or confidence intervals for the predicted mean.
old answer:
iv_l
and iv_u
give you the limits of the prediction interval for each point.
Prediction interval is the confidence interval for an observation and includes the estimate of the error.
I think, confidence interval for the mean prediction is not yet available in statsmodels
. (Actually, the confidence interval for the fitted values is hiding inside the summary_table of influence_outlier, but I need to verify this.)
Proper prediction methods for statsmodels are on the TODO list.
Addition
Confidence intervals are there for OLS but the access is a bit clumsy.
To be included after running your script:
from statsmodels.stats.outliers_influence import summary_table st, data, ss2 = summary_table(re, alpha=0.05) fittedvalues = data[:, 2] predict_mean_se = data[:, 3] predict_mean_ci_low, predict_mean_ci_upp = data[:, 4:6].T predict_ci_low, predict_ci_upp = data[:, 6:8].T # Check we got the right things print np.max(np.abs(re.fittedvalues - fittedvalues)) print np.max(np.abs(iv_l - predict_ci_low)) print np.max(np.abs(iv_u - predict_ci_upp)) plt.plot(x, y, 'o') plt.plot(x, fittedvalues, '-', lw=2) plt.plot(x, predict_ci_low, 'r--', lw=2) plt.plot(x, predict_ci_upp, 'r--', lw=2) plt.plot(x, predict_mean_ci_low, 'r--', lw=2) plt.plot(x, predict_mean_ci_upp, 'r--', lw=2) plt.show()
This should give the same results as SAS, http://jpktd.blogspot.ca/2012/01/nice-thing-about-seeing-zeros.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With