Trouble fitting curve to data set

Question

For the last two days I've been trying to fit a curve to the following data set. As you can see (from the image) the data forms a near perfect curve on its own but I haven't been able to find a way to mathematically represent the data for both interpolation and extrapolation.

y = c(0.2966, 0.2793, 0.2147, 0.1523, 0.1177, 0.1026, 0.0934, 0.0767, 
 0.0729, 0.0693, 0.0658, 0.0624, 0.0561, 0.0502, 0.0424, 0.04, 0.0356, 
 0.0335, 0.0316, 0.0279, 0.0231, 0.0217, 0.0203, 0.019, 0.02, 0.016, 
 0.0151, 0.0134, 0.0127, 0.0119, 0.0113, 0.0106, 0.01, 0.0094, 0.0089,
  0.0084, 0.0074, 0.007, 0.0062, 0.0059, 0.0053, 0.0048, 0.0043, 
  0.0041, 0.0037, 0.0033, 0.0032, 0.003, 0.0029, 0.0025, 0.0024, 
  0.0023, 0.0021, 0.002, 0.0016, 0.0016, 0.0014, 0.0012, 0.001, 
  0.0007, 0.0006, 0.0004, 0.0003)

x = c(0.77894, 0.79452, 0.85683, 0.92694, 0.97367, 0.99704, 1.01262, 
   1.04378, 1.05157, 1.05936, 1.06714, 1.07493, 1.09051, 1.10609, 
    1.12946, 1.13725, 1.15283, 1.16062, 1.16841, 1.18399, 1.20735, 
    1.21514, 1.22293, 1.23072, 1.2463, 1.25409, 1.26188, 1.27746, 
    1.28525, 1.29304, 1.30083, 1.30862, 1.3164, 1.32419, 1.33198, 
    1.33977, 1.35535, 1.36314, 1.37872, 1.38651, 1.40209, 1.41767, 
    1.43325, 1.44103, 1.45661, 1.47219, 1.47998, 1.48777, 1.49556, 
    1.51893, 1.52672, 1.53451, 1.55009, 1.55788, 1.58903, 1.59682, 
    1.6124, 1.63577, 1.67472, 1.75261, 1.79156, 1.86945, 1.92398)

Here's is the data plotted with a exponential curve (pink) and a 4th order polynomial (red). The exponential curve has quite a bit of error while the 4th order fits but you can't extrapolate using it and when applying to similar data sets it doesn't always work.

For what I'm working on I really need something perfectly fits to the curve put I haven't figured out how to do it yet. Thanks.

enter image description here

jlhoward · Accepted Answer

An object lesson in the perils of extrapolation.

In the absence of a theoretical model, your data is fit about equally well using either a logistic function (f1(...)) or a scaled log-normal density function (f2(...)). There are probably other functions that would also fit well.

df <- data.frame(x,y)
library(minpack.lm)   # for nlsLM(...)

f1 <- function(x,a,b,c,d) a*exp(-(b*x))/(1+c*exp(-d*x))
fit.1 <- nlsLM(y~f1(x,a,b,c,d), df, 
               start=c(a=1, b=1, c=100, d=0), control=list(maxiter=500))
f2 <- function(x,a,m,s) a*dlnorm(x, meanlog=m, sdlog=s)
fit.2 <- nlsLM(y~f2(x,a,m,s), df, 
               start=c(a=1, m=0, s=1), control=list(maxiter=500))

plot(y~x,df)
curve(predict(fit.1,data.frame(x)),add=TRUE, col="blue")
curve(predict(fit.2,data.frame(x)),add=TRUE, col="red")

But look what happens when you extrapolate.

plot(y~x, df, xlim=c(0.5,2), ylim=c(0,.6))
curve(predict(fit.1,data.frame(x)),add=TRUE, col="blue")
curve(predict(fit.2,data.frame(x)),add=TRUE, col="red")

It turns out that actually the log-normal density function is a slightly better fit in that the residuals are more nearly normal, although there is a strong pattern in the residuals in both cases. The point is that just by looking at the data and the fitted curve you might accept either function, but they will give very different results on extrapolation, and in fact neither one is a great fit. You really need a theoretical model.

nico · Answer

A LOESS regression seems to work quite well with that data.

plot(y~x)

ls <- loess(y~x, span = 0.5)
pr <- predict(ls, x)

lines(x, pr, col = "red", lwd = 2)

LOESS regression

Trouble fitting curve to data set

Tags:

math

r

regression

Curt

2 Answers

jlhoward

nico

Recent Activity

Donate For Us

Trouble fitting curve to data set

Tags:

math

r

regression

Curt

2 Answers

jlhoward

nico

Related questions

Recent Activity

Donate For Us