Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fitting an unknown curve [closed]

There are some related questions that I've come across (like this, this, this, and this) but they all deal with fitting data to a known curve. Is there a way to fit given data to an unknown curve? By which I mean, given some data the algorithm will give me a fit which is one function or a sum of functions. I'm programming in C, but I'm at a complete loss on how to use the gsl package to do this. I'm open to using anything that can (ideally) be piped through C. But any help on what direction I should look will be greatly appreciated.

EDIT: This is basically experimental (physics) data that I've collected, so the data will have some trend modified by additive gaussian distributed noise. In general the trend will be non-linear, so I guess that a linear regression fitting method will be unsuitable. As for the ordering, the data is time-ordered, so the curve necessarily has to be fit in that order.

like image 410
Kitchi Avatar asked Jan 01 '13 13:01

Kitchi


People also ask

What is the formula for curve fitting?

The highest-order polynomial that Trendline can use as a fitting function is a regular polynomial of order six, i.e., y = ax6 + bx5 +cx4 + ak3 + ex2 +fx + g. polynomials such as y = ax2 + bx3'2 + cx + + e.

What is nonlinear curve fitting?

09.03.2021. Non-linear curve fitting makes it possible to converge a model function dependent on an independent variable and several parameters toward a given data set. This analysis object is primarily used for determining model parameters so that the selected model is adapted to the data in the best way possible.

What is the best fitting curve?

The best curve fit is an interpolation. The error will be zero. There are an infinite number of such exact interpolatory models.


2 Answers

You might be looking for polynomial interpolation, in the field of numerical analysis.

In polynomial interpolation - given a set of points (x,y) - you are trying to find the best polynom that fits these points. One way to do it is using Newton interpolation, which is fairly easy to program.

The field of numerical analysis and interpolations in specifics is widely studied, and you might be able to get some nice upper bound to the error of the polynom.

Note however, because you are looking for a polynom that best fits your data, and the function is not really a polynom - the scale of the error when getting far from your initial training set blasts off.


Also note, your data set is finite, and there are inifnite number (actually, non-enumerable infinity) of functions that can fit the data (exactly or approximately) - so which one out of these is the best might be specific to what you actually are trying to achieve.

If you are looking for a model to fit your data, note that linear regression and polynomial interpolations are at the opposite ends of the scale: polynomial interpolation might be an overfitting to a model, while a linear regression might be underfitting it, what exactly should be used is case specific and varies from one application to the other.


Simple polynomial interpolation example:

Let's say we have (0,1),(1,2),(3,10) as our data.

The table1 we get using newton method is:

0  | 1 |                 |
1  | 2 | (2-1)/(1-0)=1   |
3  | 9 | (10-2)/(3-1)=4  | (4-1)/(3-0)=1

Now, the polynom we get is the "diagonal" that ends with the last element:

1 + 1*(x-0) + 1*(x-0)(x-1) = 1 + x + x^2 - x = x^2 +1 

(and that is a perfect fit indeed to the data we used)


(1) The table is recursively created: The first 2 columns are the x,y values - and each next column is based on the prior one. It is really easy to implement once you get it, the full explanation is in the wikipedia page for newton interpolation.

like image 199
amit Avatar answered Sep 23 '22 20:09

amit


You might want to use (Fast) Fourier Transforms to convert data to frequency domain.

With the result of the transform (a set of amplitudes and phases and frequencies) even the most twisted set of data can be represented by several functions (harmonics) of the form:

r * cos(f * t - p)

where r is the harmonic amplitude, f is the frequency an p the phase.

Finally, the unknonwn data curve is the sum of all harmonics.

I have done this in R (you have some examples of it) but I believe C has enough tools to manage it. It is also possible to pipe C and R but don't know much about it. This might be of help.

This method is really good for large chunks of data because it has complexities of:

1) decompose data with Fast Fourier Transforms (FTT) = O(n log n)

2) built the function with the resulting components = O(n)

like image 42
Helio Santos Avatar answered Sep 23 '22 20:09

Helio Santos