How to get confidence intervals from curve_fit

Q: What is confidence interval in Rstudio?

A confidence interval is an interval that contains the population parameter with probability 1−α .

Tags:

My question involves statistics and python and I am a beginner in both. I am running a simulation, and for each value for the independent variable (X) I produce 1000 values for the dependent variable (Y). What I have done is that I calculated the average of Y for each value of X and fitted these averages using scipy.optimize.curve_fit. The curve fits nicely, but I want to draw also the confidence intervals. I am not sure if what I am doing is correct or if what I want to do can be done, but my question is how can I get the confidence intervals from the covariance matrix produced by curve_fit. The code reads the averages from files first then it just simply uses curve_fit.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit


def readTDvsTx(L, B, P, fileformat):
    # L should be '_Fixed_' or '_'
    TD = []
    infile = open(fileformat.format(L, B, P), 'r')
    infile.readline()  # To remove header
    for line in infile:
        l = line.split()  # each line contains TxR followed by CD followed by TD
        if eval(l[0]) >= 70 and eval(l[0]) <=190:
            td = eval(l[2])
            TD.append(td)
    infile.close()
    tdArray = np.array(TD)

    return tdArray


def rec(x, a, b):
    return a * (1 / (x**2)) + b



fileformat = 'Densities_file{}BS{}_PRNTS{}.txt'
txR = np.array(range(70, 200, 20))
parents = np.array(range(1,6))
disc_p1 = readTDvsTx('_Fixed_', 5, 1, fileformat)


popt, pcov = curve_fit(rec, txR, disc_p1)


plt.plot(txR, rec(txR, popt[0], popt[1]), 'r-')
plt.plot(txR, disc_p1, '.')

print(popt)
plt.show()

And here is the resulting fit: enter image description here

505

asked Sep 11 '16 08:09

osmak

1 Answers

Here's a quick and wrong answer: you can approximate the errors from the covariance matrix for your a and b parameters as the square root of its diagonals: np.sqrt(np.diagonal(pcov)). The parameter uncertainties can then be used to draw the confidence intervals.

The answer is wrong because you before you fit your data to a model, you'll need an estimate of the errors on your averaged disc_p1 points. When averaging, you have lost the information about the scatter of the population, leading curve_fit to believe that the y-points you feed it are absolute and undisputable. This might cause an underestimation of your parameter errors.

For an estimate of the uncertainties of your averaged Y values, you need to estimate their dispersion measure and pass it along to curve_fit while saying that your errors are absolute. Below is an example of how to do this for a random dataset where each of your points consists of a 1000 samples drawn from a normal distribution.

from scipy.optimize import curve_fit
import matplotlib.pylab as plt
import numpy as np

# model function
func = lambda x, a, b: a * (1 / (x**2)) + b 

# approximating OP points
n_ypoints = 7 
x_data = np.linspace(70, 190, n_ypoints)

# approximating the original scatter in Y-data
n_nested_points = 1000
point_errors = 50
y_data = [func(x, 4e6, -100) + np.random.normal(x, point_errors,
          n_nested_points) for x in x_data]

# averages and dispersion of data
y_means = np.array(y_data).mean(axis = 1)
y_spread = np.array(y_data).std(axis = 1)

best_fit_ab, covar = curve_fit(func, x_data, y_means,
                               sigma = y_spread,
                               absolute_sigma = True)
sigma_ab = np.sqrt(np.diagonal(covar))

from uncertainties import ufloat
a = ufloat(best_fit_ab[0], sigma_ab[0])
b = ufloat(best_fit_ab[1], sigma_ab[1])
text_res = "Best fit parameters:\na = {}\nb = {}".format(a, b)
print(text_res)

# plotting the unaveraged data
flier_kwargs = dict(marker = 'o', markerfacecolor = 'silver',
                    markersize = 3, alpha=0.7)
line_kwargs = dict(color = 'k', linewidth = 1)
bp = plt.boxplot(y_data, positions = x_data,
                 capprops = line_kwargs,
                 boxprops = line_kwargs,
                 whiskerprops = line_kwargs,
                 medianprops = line_kwargs,
                 flierprops = flier_kwargs,
                 widths = 5,
                 manage_ticks = False)
# plotting the averaged data with calculated dispersion
#plt.scatter(x_data, y_means, facecolor = 'silver', alpha = 1)
#plt.errorbar(x_data, y_means, y_spread, fmt = 'none', ecolor = 'black')

# plotting the model
hires_x = np.linspace(50, 190, 100)
plt.plot(hires_x, func(hires_x, *best_fit_ab), 'black')
bound_upper = func(hires_x, *(best_fit_ab + sigma_ab))
bound_lower = func(hires_x, *(best_fit_ab - sigma_ab))
# plotting the confidence intervals
plt.fill_between(hires_x, bound_lower, bound_upper,
                 color = 'black', alpha = 0.15)
plt.text(140, 800, text_res)
plt.xlim(40, 200)
plt.ylim(0, 1000)
plt.show()

absolutely weighted least squares

Edit: If you are not considering the intrinsic errors on the data points, you are probably fine with using the "qiuck and wrong" case I mentioned before. The square root of the diagonal entries of covariance matrix can then be used to calculate your confidence intervals. However, note that the confidence intervals have shrunk now that we've dropped the uncertainties:

from scipy.optimize import curve_fit
import matplotlib.pylab as plt
import numpy as np

func = lambda x, a, b: a * (1 / (x**2)) + b

n_ypoints = 7
x_data = np.linspace(70, 190, n_ypoints)

y_data = np.array([786.31, 487.27, 341.78, 265.49,
                    224.76, 208.04, 200.22])
best_fit_ab, covar = curve_fit(func, x_data, y_data)
sigma_ab = np.sqrt(np.diagonal(covar))

# an easy way to properly format parameter errors
from uncertainties import ufloat
a = ufloat(best_fit_ab[0], sigma_ab[0])
b = ufloat(best_fit_ab[1], sigma_ab[1])
text_res = "Best fit parameters:\na = {}\nb = {}".format(a, b)
print(text_res)

plt.scatter(x_data, y_data, facecolor = 'silver',
            edgecolor = 'k', s = 10, alpha = 1)

# plotting the model
hires_x = np.linspace(50, 200, 100)
plt.plot(hires_x, func(hires_x, *best_fit_ab), 'black')
bound_upper = func(hires_x, *(best_fit_ab + sigma_ab))
bound_lower = func(hires_x, *(best_fit_ab - sigma_ab))
# plotting the confidence intervals
plt.fill_between(hires_x, bound_lower, bound_upper,
                 color = 'black', alpha = 0.15)
plt.text(140, 630, text_res)
plt.xlim(60, 200)
plt.ylim(0, 800)
plt.show()

no-sigma-case

If you're unsure whether to include the absolute errors or how to estimate them in your case, you'd be better off asking for advice at Cross Validated, as Stack Overflow is mainly for discussion on implementations of regression methods and not for discussion on the underlying statistics.

answered Oct 28 '22 20:10

Vlas Sokolov

Related questions
                            
                                ImportError: No module named 'html.parser'; 'html' is not a package (python3) [duplicate]
                            
                                Creating transactions with with statements in psycopg2
                            
                                Matplotlib into a Django Template
                            
                                Read merged cells in Excel with Python
                            
                                Python: issue when using vars() dictionary
                            
                                Plotting Histogram with given x and y values
                            
                                argsort for a multidimensional ndarray
                            
                                How to detect bullet holes on the target
                            
                                Django model one foreign key to many tables
                            
                                Insert 0s into 2d array
                            
                                What's the difference between apt-get virtualenv and pip virtualenv?
                            
                                Argparse with two values for one argument
                            
                                How to have python code and markdown in one cell
                            
                                FileNotFoundError: [WinError 2] The system cannot find the file specified:
                            
                                Why does python/numpy's += mutate the original array?
                            
                                Set weight and bias tensors of tensorflow conv2d operation
                            
                                How to get the count of an element in a tensor in TensorFlow?
                            
                                Add trend line to pandas
                            
                                pandas.DataFrame corrwith() method
                            
                                AttributeError: 'Figure' object has no attribute 'plot'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get confidence intervals from curve_fit

Tags:

python

python-3.x

statistics

scipy

curve-fitting

osmak

People also ask

1 Answers

Vlas Sokolov

Recent Activity

Donate For Us