Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find good start values for nls function?

Tags:

r

ggplot2

nls

I don't understand why I can't have a nls function for these data. I have tried with a lot of different start values and I have always the same error.

Here is what I have been doing:

expFct2 = function (x, a, b,c)
{
  a*(1-exp(-x/b)) + c  
}
vec_x <- c(77.87,87.76,68.6,66.29)
vec_y <- c(1,1,0.8,0.6)
dt <- data.frame(vec_x=vec_x,vec_y=vec_y)
ggplot(data = dt,aes(x = vec_x, y = vec_y)) +  geom_point() + 
     geom_smooth(data=dt, method="nls", formula=y~expFct2(x, a, b, c),
       se=F, start=list(a=1, b=75, c=-5)

I have always this error:

Error in method(formula, data = data, weights = weight, ...) : 
  singular gradient
like image 236
Tali Avatar asked Mar 13 '12 21:03

Tali


3 Answers

This can be written with two linear parameters (.lin1 and .lin2) and one nonlinear parameter (b) like this:

a*(1-exp(-x/b)) + c  
= (a+c) - a * exp(-x/b)
= .lin1 + .lin2 * exp(-x/b)

where .lin1 = a+c and .lin2 = -a (so a = - .lin2 and c = .lin1 + .lin2) This lets us use "plinear" which only requires specification of a starting value for the single nonlinear parameter (eliminating the problem of how to set the starting values for the other parameters) and which converges despite the starting value of b=75 being far from that of the solution:

nls(y ~ cbind(1, exp(-x/b)), start = list(b = 75), alg = "plinear")

Here is the result of a run from which we can see from the size of .lin2 that the problem is badly scaled:

> x <- c(77.87,87.76,68.6,66.29)
> y <- c(1,1,0.8,0.6)
> nls(y ~ cbind(1, exp(-x/b)), start = list(b = 75), alg = "plinear")
Nonlinear regression model
  model:  y ~ cbind(1, exp(-x/b)) 
   data:  parent.frame() 
         b      .lin1      .lin2 
 3.351e+00  1.006e+00 -1.589e+08 
 residual sum-of-squares: 7.909e-05

Number of iterations to convergence: 9 
Achieved convergence tolerance: 9.887e-07 
> R.version.string
[1] "R version 2.14.2 Patched (2012-02-29 r58660)"
> win.version()
[1] "Windows Vista (build 6002) Service Pack 2"

EDIT: added sample run and comment on scaling.

like image 194
G. Grothendieck Avatar answered Nov 20 '22 02:11

G. Grothendieck


Fitting a three-parameter nonlinear model to four data points is going to be moderately challenging in any case, although in this case the data are well behaved. Point #1 is that your starting value for your c parameter (-5) was way off. Drawing a picture of the curve corresponding to your starting parameters (see below) would help you understand this (so would recognizing that the curve you get will range from c at its minimum to c+a at its maximum, and the range of your data is from 0.6 to 1 ...)

However, even with a better starting guess I found myself fussing with control parameters (i.e. control=nls.control(maxiter=200)), followed by more warnings -- nls is not known for its robustness. So I tried the SSasympOff model, which implements a self-starting version of the curve you want to fit.

start1 <- list(a=1, b=75, c=-5)
start2 <- list(a=0.5, b=75, c=0.5)  ## a better guess

pfun <- function(params) {
  data.frame(vec_x=60:90,
             vec_y=do.call(expFct2,c(list(x=60:90),params)))
}
library(ggplot2)
ggplot(data = dt,aes(x = vec_x, y = vec_y)) +  geom_point() +
  geom_line(data=pfun(start1))+
  geom_line(data=pfun(start2),colour="red")+
  geom_smooth(data=dt, method="nls", formula=y~SSasympOff(x, a, b, c),
              se=FALSE)

My advice in general is that it's easier to figure out what's going on and fix problems if you fit nls outside of geom_smooth and construct the curve you want to add using predict.nls ...

More generally, the way to get good starting parameters is to understand the geometry of the function you are fitting, and which parameters control which aspects of the curve. As I mentioned above, c is the minimum value of the shifted saturating-exponential curve, a is the range, and b is a scale parameter (you can see that when x=b, the curve is 1-exp(-1) or approximately 2/3 of the way from the minimum to the maximum). Either a bit of algebra and calculus (i.e. taking limits), or playing around with the curve() function, are good ways to gather this information.

like image 10
Ben Bolker Avatar answered Nov 20 '22 02:11

Ben Bolker


I struggle to find an interpretation to your parameters: a is a slope, b the speed of convergence, and a+c the limit, but c by itself does not seem to mean much. After reparametrizing your function, the problem disappears.

f <- function (x, a,b,c) a + c * exp(-x/abs(b))
nls(y~f(x, a, b, c), data=dt, start=list(a=1, b=75, c=-5), trace=TRUE)

However, the value of c looks very, very high: that is probably why the model initially failed to converge.

Nonlinear regression model
  model:  y ~ f(x, a, b, c) 
   data:  dt 
         a          b          c 
 1.006e+00  3.351e+00 -1.589e+08 
 residual sum-of-squares: 7.909e-05

Number of iterations to convergence: 9 
Achieved convergence tolerance: 2.232e-06 

Here is another, more reasonable parametrization of the same function.

g <- function (x, a,b,c) a * (1-exp(-(x-c)/abs(b)))
nls(y~g(x, a, b, c), data=dt, start=list(a=1, b=75, c=-5), trace=TRUE)

Nonlinear regression model
  model:  y ~ g(x, a, b, c) 
   data:  dt 
     a      b      c 
 1.006  3.351 63.257 
 residual sum-of-squares: 7.909e-05

Number of iterations to convergence: 10 
Achieved convergence tolerance: 1.782e-06 
like image 2
Vincent Zoonekynd Avatar answered Nov 20 '22 02:11

Vincent Zoonekynd