Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R, how do you get the best fitting equation to a set of data?

Tags:

r

equation

I'm not sure wether R can do this (I assume it can, but maybe that's just because I tend to assume that R can do anything :-)). What I need is to find the best fitting equation to describe a dataset.

For example, if you have these points:

df = data.frame(x = c(1, 5, 10, 25, 50, 100), y = c(100, 75, 50, 40, 30, 25))

How do you get the best fitting equation? I know that you can get the best fitting curve with:

plot(loess(df$y ~ df$x))

But as I understood you can't extract the equation, see Loess Fit and Resulting Equation.

When I try to build it myself (note, I'm not a mathematician, so this is probably not the ideal approach :-)), I end up with smth like:

y.predicted = 12.71 + ( 95 / (( (1 + df$x) ^ .5 ) / 1.3))

Which kind of seems to approximate it - but I can't help to think that smth more elegant probably exists :-)

I have the feeling that fitting a linear or polynomial model also wouldn't work, because the formula seems different from what those models generally use (i.e. this one seems to need divisions, powers, etc). For example, the approach in Fitting polynomial model to data in R gives pretty bad approximations.

I remember from a long time ago that there exist languages (Matlab may be one of them?) that do this kind of stuff. Can R do this as well, or am I just at the wrong place?

(Background info: basically, what we need to do is find an equation for determining numbers in the second column based on the numbers in the first column; but we decide the numbers ourselves. We have an idea of how we want the curve to look like, but we can adjust these numbers to an equation if we get a better fit. It's about the pricing for a product (a cheaper alternative to current expensive software for qualitative data analysis); the more 'project credits' you buy, the cheaper it should become. Rather than forcing people to buy a given number (i.e. 5 or 10 or 25), it would be nicer to have a formula so people can buy exactly what they need - but of course this requires a formula. We have an idea for some prices we think are ok, but now we need to translate this into an equation.

like image 685
Matherion Avatar asked Feb 19 '23 21:02

Matherion


2 Answers

My usual plug: http://creativemachines.cornell.edu/eureqa

But as Roland said, the "best fit in general" has little meaning, since any function can be expressed as a Taylor series. Since a set of data is expected to have noise aka errors in its values, a big part of curve-fitting is determining what is noise and what isn't.
If you pick some fit function arbitrarily, one thing I can pretty much guarantee is that extrapolated points will diverge in a hurry.

like image 81
Carl Witthoft Avatar answered Feb 21 '23 12:02

Carl Witthoft


Multiple Linear Regression Example

fit <- lm(y ~ x1 + x2 + x3, data=mydata)

summary(fit) # show results

The code above should give you the line that best fits your data using OLS.

like image 44
philq Avatar answered Feb 21 '23 11:02

philq