Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to make a polynomial fit of a part of a data set

Tags:

algorithm

I have a problem of algorithm. I don't know if stackoverflow is the right place to post it but since I use matlab and want to do this with it, I post it there. My problem is the following : I have a set of data and I do not know much about it except the fact that at the end of this set, the points have to be quite linear. I want to make a linear fit of these points that are linearly distributed without using the part that is not.

(an image is always better to understand) :enter image description here

As you see there, I have the blue data, that is not linear but that has a linear part at the end (red part). What I would want, is to find an algorithm that allows me to know when the behaviour of the data curve ends its linearity.

I don't know if I'm clear ?

I've tried by taking a few points at the right and make a linear fit of those few points. Then add some points to the few and check if those are "near enough" of the linear fit. Then make once more a linear fit with the added points and so on but I think it's not the best solution because the "first" points have lot of noise (which is not represented here on the image)...

Do you have any idea or proposition or link ?

Thank you !

like image 490
mwoua Avatar asked Jul 11 '13 14:07

mwoua


2 Answers

What I would want, is to find an algorithm that allows me to know when the behaviour of the data curve ends its linearity.

Linear data has a particularly nice property, it has constant slope. The second derivative of a linear section should be approximately zero.

Use a spline fit (with some kind of smoothing if the data is noisy) to get a continuous version of your data, call it g(x). When g''(x) ~ 0, i.e. when the second derivative is small, this is a linear section.

like image 101
Hooked Avatar answered Sep 22 '22 13:09

Hooked


Snippet your data set by x-position and then set some cutoff for linearity.

  • Start at one end
  • Check the pearson correlation coefficient of the next predefined portion of the graph
  • If above a certain threshold add the included portion of x to your range, otherwise stop there

Alternately you could do a check for linear being the best fit of a number of polynomial fittings. For that I would:

  • Define a few generic functions of order 1-n where n is pretty small (maybe 3)
  • Add data points to linear testing set
  • Compare least squares value of your n functions
  • If linear has the lowest least squares value, or is within some distance of the minimum of your n function continue adding points. Otherwise stop and say that the function was linear before the last addition.

Those are at least pretty straightforward ways of doing it, and in my Occam's razor mind they also have the lowest complexity (n*curve-fit complexity in both cases, though the second one has a bigger constant.), though it's very possible that there lower complexity algorithms out there.

like image 35
Slater Victoroff Avatar answered Sep 21 '22 13:09

Slater Victoroff