Drawing regression line, confidence interval, and prediction interval in Python

Tags:

I'm new to the regression game and hope to plot a functionally arbitrary, nonlinear regression line (plus confidence and prediction intervals) for a subset of data that satisfies a certain condition (i.e. with mean replicate value exceeding a threshold; see below).

The data is generated for independent variable x across 20 different values: x=(20-np.arange(20))**2, with rep_num=10 replicates for each condition. The data shows strong nonlinearity across x and looks like the following:

import numpy as np

mu = [.40, .38, .39, .35, .37, .33, .34, .28, .11, .24,
      .03, .07, .01, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]     

data = np.zeros((20, rep_num))
for i in range(13):
    data[i] = np.clip(np.random.normal(loc=mu[i], scale=0.1, size=rep_num), 0., 1.)

I can make a scatter plot of the data; the replicate means are shown by the red dots:

import matplotlib.pyplot as plt

plt.scatter(np.log10(np.tile(x[:,None], rep_num)), data, 
            facecolors='none', edgecolors='k', alpha=0.25)
plt.plot(np.log10(x), data.mean(1), 'ro', alpha=0.8)
plt.plot(np.log10(x), np.repeat(0., 20), 'k--')
plt.xlim(-0.02, np.max(np.log10(x)) + 0.02)
plt.ylim(-0.01, 0.7)

scatter plot

My goal is to plot a regression line for only those data that have replicate mean > 0.02. In addition, I would like to add a 95% confidence interval (black dashed lines) around the regression, as well as a 95% prediction interval (blue dashed lines) -- ideally, the prediction interval can also be colored in with transparent blue background.

The final plot (without the blue background inside the prediction interval) would look something like this:

enter image description here

How would I make this? My online search yielded very different partial approaches using seaborn, scipy, and statsmodels. The applications of some of those template functions did not appear to work alongside the existing matplotlib scatter plot.

841

asked Aug 27 '17 22:08

neither-nor

1 Answers

OK, here's a shot at this (withouth prediction band, though). First of all you want to select the applicable data:

threshold = 0.02
reg_x = np.log10(x)[data.mean(1)>threshold]
reg_y = data.mean(1)[data.mean(1)>threshold]

Then you choose a model and perform a fit. Note, here I chose a second order polynomial but in principle you could do anything. For the fits I use kapteyn, this has a built-in confidence bans method, although it would be straightforward to implement (see e.g. Delta method)

from kapteyn import kmpfit

# Set model to fit.
def model(p, x):
    a, b, c = p
    return a + b*x + c*x**2

# Perform fit.
f = kmpfit.simplefit(model, [.1, .1, .1], reg_x, reg_y)

f contains all the estimated parameters and such, you can use that for plotting etc.

x = np.linspace(0, 3, 100)
plt.plot(x, model(f.params, x), linestyle='-', color='black', marker='')

For the confidence bands, we need the partial derivatives of the model with respect to the parameters (yes, some math). Again, this is easy for a polynomial model, shouldn't be a problem for any other model either.

# Partial derivatives:
dfdp = [1., reg_x, reg_x**2]
_, ci_upper, ci_lower = f.confidence_band(reg_x, dfdp, 0.95, model)

# Plot.
plt.plot(reg_x, ci_upper, linestyle='--', color='black', marker='')
plt.plot(reg_x, ci_lower, linestyle='--', color='black', marker='')

Unfortunately there is not prediction_bands() routine in the package, at least not that I know of. Assume you found some method for the prediction band, the plotting and preparation would look the same though..

p_upper, p_lower = prediction_band(*args, **kwargs)
plt.fill_between(reg_x, p_upper, p_lower, facecolor='blue', alpha=0.2, linestyle='')

Hope this helps, L.

answered Sep 23 '22 22:09

rammelmueller

Related questions
                            
                                Simple license protection for Python app
                            
                                Anomaly detection using Python [closed]
                            
                                sys.setswitchinterval in Python 3.2 and beyond
                            
                                Semantic Search in Python for hobbies + latest news
                            
                                UnicodeEncodeError: 'ascii' codec can't encode character [...]
                            
                                C Python: Running Python code within a context
                            
                                Python Popen().stdout.read() hang
                            
                                Streaming media files via DLNA/UPnP
                            
                                OpenCV and Numpy interacting badly
                            
                                sklearn selectKbest: which variables were chosen?
                            
                                How do I fix the deprecation warning that comes with pylab.pause?
                            
                                Python requests send certificate as string
                            
                                Where does my embedded python stdout go?
                            
                                Improve contour detection with OpenCV (Python)
                            
                                Can functions know if they are already multiprocessed in Python (joblib)
                            
                                Is there a clean way to suppress compiler warnings from Cython when using pyximport.install?
                            
                                Slow equality evaluation for identical objects (x == x)
                            
                                Tensorflow Windows Accessing Folders Denied:"NewRandomAccessFile failed to Create/Open: Access is denied. ; Input/output error"
                            
                                execinfo.h missing when installing xgboost in Cygwin
                            
                                Memory leak with TensorFlow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Drawing regression line, confidence interval, and prediction interval in Python

Tags:

python

matplotlib

statistics

seaborn

regression

neither-nor

People also ask

1 Answers

rammelmueller

Recent Activity

Donate For Us