How can I fit two linear functions to a set of data points?

Tags:

I have a set of data points, which appear sort of like a line with a curve near the beginning. See the image below, which shows the points with a line of best fit (fit to the whole data set).

Plot with one line of best fit

Instead, they could be described by two linear functions (a line through the leftmost set of points and a separate line through the rest of the data points). What these points actually correspond to are neutron decay, which contains two different isotopes.

I don't know which points correspond to which isotope, so I need to somehow make a best guess. The curve for one isotope will be a straight line, and the curve for the other will be a different straight line. How can I fit two different lines of best fit (linear) to the set of data points, such that the fitting for both is optimized?

One idea I had is to pick a "cutouff point", say at t=100 (x-axis), and fit the points on the left to a line, and the points to the right to a different line. I could then compute chi^2 for both lines in order to get the "goodness" of the fits. Then, I could just keep doing the same thing many times with slightly different cutoff points, until I find the pair of lines that gives the best overall fits.

A different idea, which seems more involved, is to describe these data points as a combination of the two lines, y= m1*t + m2*t + b1 + b2, where the ms are the slopes and bs are the y-intercepts. Then, taking the derivative of the total curve, I would have dy/dt = m1+m2. Then perhaps I could cycle through different "cutouff points", and fit to lines until I get a combination where the derivative is closest to m1+m2. But I'm not sure how to do this, since I'm not working with one curve initially, just a bunch of discrete data points.

What would be the best way to go about this optimization of two fits in Python?

438

asked Oct 25 '18 19:10

curious_cosmo

2 Answers

This can be interpreted as a time-series segmentation problem in combination with linear regression. There are multiple approaches to tackle this problem. One of these you already mentioned: a manually selected point where to segment the data, another one is trying to minimize the error.

First I try to recreate the data:

import numpy as np; import matplotlib.pyplot as plt
y1 = np.linspace(5.5, 3.7, num=100)
y1 = y1 + np.random.rand(y1.shape[0]) * np.linspace(0, .3, num=y1.shape[0])
y2 = np.linspace(3.7, 1.1, num=500)
y2 = y2 + np.random.rand(y2.shape[0]) * np.linspace(0.3, 1.9, num=y2.shape[0])
y = np.append(y1, y2)
x = np.array(range(len(y)))

Then I do two linear fits using numpy.linalg.lstsq which itself is based on the method of least squares:

x1 = x[:100]
y1 = y[:100]
A1 = np.vstack([x1, np.ones(len(x1))]).T
m1, c1 = np.linalg.lstsq(A1, y1, rcond=None)[0]

x2 = x[100:]
y2 = y[100:]
A2 = np.vstack([x2, np.ones(len(x2))]).T
m2, c2 = np.linalg.lstsq(A2, y2, rcond=None)[0]

Plotting this results in the following image:

plt.scatter(x, y)
plt.plot(x1, m1 * x1 + c1, 'r')
plt.plot(x2, m2 * x2 + c2, 'r')
plt.show()

plot

You now could use an automatic segmentation algorithm like SWAB to replace the [100:] and [:100] slices, but I'd advise against it, if you are able to manually decide, at which point to split the data. Have a look at the link provided, if you're looking for an implementation.

171

answered Oct 21 '22 10:10

Felix

Here is an example of fitting two straight lines to one data set, with the crossover point between the two lines also fitted as a parameter. This example uses scipy's Differential Evolution (DE) genetic algorithm to determine initial parameter estimates. The scipy implementation of DE uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, and this algorithm requires bounds within which to search - in this example those bounds are taken from the data maximum and minimum values.

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings

xData = numpy.array([19.1647, 18.0189, 16.9550, 15.7683, 14.7044, 13.6269, 12.6040, 11.4309, 10.2987, 9.23465, 8.18440, 7.89789, 7.62498, 7.36571, 7.01106, 6.71094, 6.46548, 6.27436, 6.16543, 6.05569, 5.91904, 5.78247, 5.53661, 4.85425, 4.29468, 3.74888, 3.16206, 2.58882, 1.93371, 1.52426, 1.14211, 0.719035, 0.377708, 0.0226971, -0.223181, -0.537231, -0.878491, -1.27484, -1.45266, -1.57583, -1.61717])
yData = numpy.array([0.644557, 0.641059, 0.637555, 0.634059, 0.634135, 0.631825, 0.631899, 0.627209, 0.622516, 0.617818, 0.616103, 0.613736, 0.610175, 0.606613, 0.605445, 0.603676, 0.604887, 0.600127, 0.604909, 0.588207, 0.581056, 0.576292, 0.566761, 0.555472, 0.545367, 0.538842, 0.529336, 0.518635, 0.506747, 0.499018, 0.491885, 0.484754, 0.475230, 0.464514, 0.454387, 0.444861, 0.437128, 0.415076, 0.401363, 0.390034, 0.378698])


def func(xArray, breakpoint, slopeA, offsetA, slopeB, offsetB):
    returnArray = []
    for x in xArray:
        if x < breakpoint:
            returnArray.append(slopeA * x + offsetA)
        else:
            returnArray.append(slopeB * x + offsetB)
    return returnArray


# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
    warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
    val = func(xData, *parameterTuple)
    return numpy.sum((yData - val) ** 2.0)


def generate_Initial_Parameters():
    # min and max used for bounds
    maxX = max(xData)
    minX = min(xData)
    maxY = max(yData)
    minY = min(yData)
    slope = 10.0 * (maxY - minY) / (maxX - minX) # times 10 for safety margin

    parameterBounds = []
    parameterBounds.append([minX, maxX]) # search bounds for breakpoint
    parameterBounds.append([-slope, slope]) # search bounds for slopeA
    parameterBounds.append([minY, maxY]) # search bounds for offsetA
    parameterBounds.append([-slope, slope]) # search bounds for slopeB
    parameterBounds.append([minY, maxY]) # search bounds for offsetB


    result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
    return result.x

# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()

# call curve_fit without passing bounds from genetic algorithm
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Parameters:', fittedParameters)
print()

modelPredictions = func(xData, *fittedParameters) 

absError = modelPredictions - yData

SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

answered Oct 21 '22 11:10

James Phillips

Related questions
                            
                                Matlab lsqnonlin in Python
                            
                                Pygraphviz Installation Failed with error code 1083 Cannot open file graphviz/cgraph.h: No such file or directory
                            
                                Django JsonResponse returning content-type text/html instead of application/json
                            
                                Python converting pdf to image?
                            
                                pyqtgraph: save / export 3d graph
                            
                                Merge time to time period
                            
                                How to convert SQL scalar subquery to SQLAlchemy expression
                            
                                pandas to_latex(): how to make lines in tables?
                            
                                blosc compression in hdf5 via h5py
                            
                                TypeError: a bytes-like object is required, not 'Binary' [closed]
                            
                                How to implement a RabbitMQ consumer using Pyspark Streaming module?
                            
                                Python get system environment variables on windows
                            
                                How do I override a function of a python library?
                            
                                Python How to remove \" from a string
                            
                                Numpy: Set array element to another array
                            
                                Getting "not supported calendar message.ics" attachment with outlook email invite from Python
                            
                                Python: Identifying a State Space Model for a System
                            
                                Pandas - Category variable and group by - is this a bug?
                            
                                Sequentially fitting Random Forest sklearn
                            
                                How to install a python library manually

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I fit two linear functions to a set of data points?

Tags:

python

curve-fitting

data-fitting

curious_cosmo

People also ask

2 Answers

Felix

James Phillips

Recent Activity

Donate For Us