Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Curve fit or interpolation in a semilogy plot using scipy

I have very few data points and I want to create a line to best fit the data points when plotted in a semilogy scale. I have tried curve-fit and cubic interpolation from scipy, but none of them seems to be very reasonable to me compared to the data trend.

I would kindly ask you to check if there is a more efficient way to create a straight line fit for the data. Probably extrapolation can do, but I did not find a good documentation on extrapolation on python.

your help is very appreciated

import sys
import os
import numpy
import matplotlib.pyplot as plt
from pylab import *
from scipy.optimize import curve_fit
import scipy.optimize as optimization
from scipy.interpolate import interp1d
from scipy import interpolate

enter image description here

Mass500 = numpy.array([ 13.938 , 13.816,  13.661,  13.683,  13.621,  13.547,  13.477,  13.492,  13.237,
  13.232,  13.07,   13.048,  12.945,  12.861,  12.827,  12.577,  12.518])

y500 = numpy.array([  7.65103978e-06,   4.79865790e-06,   2.08218909e-05,   4.98385924e-06,
   5.63462673e-06,   2.90785458e-06,   2.21166794e-05,   1.34501705e-06,
   6.26021870e-07,   6.62368879e-07,   6.46735547e-07,   3.68589447e-07,
   3.86209019e-07,   5.61293275e-07,   2.41428755e-07,   9.62491134e-08,
   2.36892162e-07])



plt.semilogy(Mass500, y500, 'o')


# interpolation
f2 = interp1d(Mass500, y500, kind='cubic')
plt.semilogy(Mass500, f2(Mass500), '--')


# curve-fit
def line(x, a, b):
    return 10**(a*x+b)

#Initial guess.
x0     = numpy.array([1.e-6, 1.e-6])

print optimization.curve_fit(line, Mass500, y500, x0)
popt, pcov = curve_fit(line, Mass500, y500)
print popt
plt.semilogy(Mass500, line(Mass500, popt[0], popt[1]), 'r-')



plt.legend(['data', 'cubic', 'curve-fit'], loc='best')

show()
like image 920
bhjghjh Avatar asked Jan 29 '23 19:01

bhjghjh


1 Answers

There are many regression functions available in numpy and scipy. scipy.stats.lingress is one of the simpler functions, and it returns common linear regression parameters.

Here are two options for fitting semi-log data:

  1. Plot Transformed Data
  2. Rescale Axes and Transform Input/Output Function Values

Given

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt

%matplotlib inline


# Data
mass500 = np.array([ 
    13.938 , 13.816,  13.661,  13.683,
    13.621,  13.547,  13.477,  13.492,
    13.237,  13.232,  13.07,   13.048,  
    12.945,  12.861,  12.827,  12.577,  
    12.518
])

y500 = np.array([  
    7.65103978e-06,   4.79865790e-06,   2.08218909e-05,   4.98385924e-06,
    5.63462673e-06,   2.90785458e-06,   2.21166794e-05,   1.34501705e-06,
    6.26021870e-07,   6.62368879e-07,   6.46735547e-07,   3.68589447e-07,
    3.86209019e-07,   5.61293275e-07,   2.41428755e-07,   9.62491134e-08,
    2.36892162e-07
])

Code

Option 1: Plot Transformed Data

# Regression Function
def regress(x, y):
    """Return a tuple of predicted y values and parameters for linear regression."""
    p = sp.stats.linregress(x, y)
    b1, b0, r, p_val, stderr = p
    y_pred = sp.polyval([b1, b0], x)
    return y_pred, p

# Plotting
x, y = mass500, np.log(y500)                      # transformed data
y_pred, _ = regress(x, y)

plt.plot(x, y, "mo", label="Data")
plt.plot(x, y_pred, "k--", label="Pred.")
plt.xlabel("Mass500")
plt.ylabel("log y500")                            # label axis
plt.legend()

Output

enter image description here

A simple approach is to plot transformed data and label the appropriate log axes.


Option 2: Rescale Axes and Transform Input/Output Function Values

Code

x, y = mass500, y500                              # data, non-transformed
y_pred, _ = regress(x, np.log(y))                 # transformed input             

plt.plot(x, y, "o", label="Data")
plt.plot(x, np.exp(y_pred), "k--", label="Pred.") # transformed output
plt.xlabel("Mass500")
plt.ylabel("y500")
plt.semilogy()
plt.legend()

Output

enter image description here

A second option is to alter the axes to semi-log scales (via plt.semilogy()). Here the non-transformed data naturally appears linear. Also notice the labels represent the data as-is.

To make an accurate regression, all that remains is to transform data passed into the regression function (via np.log(x) or np.log10(x)) in order to return the proper regression parameters. This transformation is immediately reversed when plotting predicated values using a complementary operation, i.e. np.exp(x) or 10**x.

like image 162
pylang Avatar answered Jan 31 '23 07:01

pylang