Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding a curve to match data

I'm looking for a non-linear curve fitting routine (probably most likely to be found in R or Python, but I'm open to other languages) which would take x,y data and fit a curve to it.

I should be able to specify as a string the type of expression I want to fit.

Examples:

"A+B*x+C*x*x"
"(A+B*x+C*x*x)/(D*x+E*x*x)"
"sin(A+B*x)*exp(C+D*x)+E+F*x"

What I would get out of this is at least the values for the constants (A, B, C, etc.) And hopefully stats about the fitness of the match.

There are commercial programs to do this, but I expected to be able to find something as common as fitting to a desired expression in a language library nowadays. I suspect SciPy's optimization stuff might be able to do this, but I can't see that it lets me define an equation. Likewise, I can't seem to find exactly what I want in R.

Is what I'm looking for out there, or do I need to roll my own? I hate to do it if it's there and I'm just having trouble finding it.


Edit: I want to do this for a bit more control over the process than I get from LAB Fit. The LAB Fit UI is dreadful. I'd also like to be able to break the range into multiple pieces and have different curves represent the different pieces of the range. In the end, the result has to be able to (speed-wise) beat a LUT with linear interpolation or I'm not interested.

In my current set of problems, I have trig functions or exp() and I need to execute them 352,800 times per second in real time (and use only a fraction of the CPU). So I plot the curve and use the data to drive the curve fitter to get less expensive approximations. In the old days, LUTs were almost always the solution, but nowadays skipping the memory lookups and doing an approximation is sometimes faster.

like image 469
Nosredna Avatar asked Aug 31 '09 16:08

Nosredna


3 Answers

To answer your question in a general sense (regarding parameter estimation in R) without considering the specifics of the equations you pointed out, I think you are looking for nls() or optim()... 'nls' is my first choice as it provides error estimates for each estimated parameter and when it fails I use 'optim'. If you have your x,y variables:

out <- tryCatch(nls( y ~ A+B*x+C*x*x, data = data.frame(x,y), 
                start = c(A=0,B=1,C=1) ) ,
                error=function(e) 
                optim( c(A=0,B=1,C=1), function(p,x,y)  
                      sum((y-with(as.list(p),A + B*x + C*x^2))^2), x=x, y=y) )

to get the coefficients, something like

getcoef <- function(x) if(class(x)=="nls") coef(x) else x$par
getcoef(out)

If you want the standard errors in the case of 'nls',

summary(out)$parameters

The help files and r-help mailing list posts contain many discussions regarding specific minimization algorithms implemented by each (the default used in each example case above) and their appropriateness for the specific form of the equation at hand. Certain algorithms can handle box constraints, and another function called constrOptim() will handle a set of linear constraints. This website may also help:

http://cran.r-project.org/web/views/Optimization.html

like image 96
hatmatrix Avatar answered Oct 01 '22 13:10

hatmatrix


Your first model is actually linear in the three parameters and can be fit in R using

 fit <- lm(y ~ x + I(x^2), data=X)

which will get you your three parameters.

The second model can also be fit using nls() in R with the usual caveats of having to provide starting values etc. The statistical issues in optimization are not necessarily the same as the numerical issues -- you cannot just optimize any functional form no matter which language you choose.

like image 21
Dirk Eddelbuettel Avatar answered Oct 01 '22 13:10

Dirk Eddelbuettel


Check out GNU Octave - between its polyfit() and the nonlinear constraints solver it ought to be possible to construct something suitable for your problem.

like image 23
moonshadow Avatar answered Oct 01 '22 12:10

moonshadow