Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error using `loess.smooth` but not `loess` or `lowess`

Tags:

r

smooth

loess

I need to smooth some simulated data, but occasionally run into problems when the simulated ordinates to be smoothed are mostly the same value. Here is a small reproducible example of the simplest case.

> x <- 0:50
> y <- rep(0,51)
> loess.smooth(x,y)
Error in simpleLoess(y, x, w, span, degree, FALSE, FALSE, normalize = FALSE,  : 
   NA/NaN/Inf in foreign function call (arg 1)

loess(y~x), lowess(x,y), and their analogue in MATLAB produce the expected results without error on this example. I am using loess.smooth here because I need the estimates evaluated at a set number of points. According to the documentation, I believe loess.smooth and loess are using the same estimation functions, but the former is an "auxiliary function" to handle the evaluation points. The error seems to come from a C function:

> traceback()
3: .C(R_loess_raw, as.double(pseudovalues), as.double(x), as.double(weights), 
   as.double(weights), as.integer(D), as.integer(N), as.double(span), 
   as.integer(degree), as.integer(nonparametric), as.integer(order.drop.sqr), 
   as.integer(sum.drop.sqr), as.double(span * cell), as.character(surf.stat), 
   temp = double(N), parameter = integer(7), a = integer(max.kd), 
   xi = double(max.kd), vert = double(2 * D), vval = double((D + 
       1) * max.kd), diagonal = double(N), trL = double(1), 
   delta1 = double(1), delta2 = double(1), as.integer(0L))
2: simpleLoess(y, x, w, span, degree, FALSE, FALSE, normalize = FALSE, 
   "none", "interpolate", control$cell, iterations, control$trace.hat)
1: loess.smooth(x, y)

loess also calls simpleLoess, but with what appears to be different arguments. Of course, if you vary enough of the y values to be nonzero, loess.smooth runs without error, but I need the program to run in even the most extreme case.

Hopefully, someone can help me with one and/or all of the following:

  1. Understand why only loess.smooth, and not the other functions, produces this error and find a solution for this problem.
  2. Find a work-around using loess but still evaluating the estimate at a specified number of points that can differ from the vector x. For example, I might want to use only x <- seq(0,50,10) in the smoothing, but evaluate the estimate at x <- 0:50. As far as I know, using predict with a new data frame will not properly handle this situation, but please let me know if I am missing something there.
  3. Handle the error in a way that doesn't stop the program from moving onto the next simulated data set.

Thanks in advance for any help on this problem.

like image 217
Sandy Avatar asked Jan 10 '11 10:01

Sandy


1 Answers

For part 1: This took a bit of tracking down, but if you do:

loess.smooth(x, y, family = "guassian")

the model will fit. This arises due to the different defaults of loess.smooth and loess; the former has family = c("symmetric", "gaussian") whilst the latter has it reversed. If you trawl through the code for loess and loess.smooth, you'll see that when family = "gaussian" iterations is set to 1. Otherwise it takes the value loess.control()$iterations. If you do iterations in simpleLoess, the following function call returns a vector of NaN:

pseudovalues <- .Fortran(R_lowesp, as.integer(N), as.double(y), 
            as.double(z$fitted.values), as.double(weights), as.double(robust), 
            integer(N), pseudovalues = double(N))$pseudovalues

Which causes the next function call to throw the error you saw:

zz <- .C(R_loess_raw, as.double(pseudovalues), as.double(x), 
            as.double(weights), as.double(weights), as.integer(D), 
            as.integer(N), as.double(span), as.integer(degree), 
            as.integer(nonparametric), as.integer(order.drop.sqr), 
            as.integer(sum.drop.sqr), as.double(span * cell), 
            as.character(surf.stat), temp = double(N), parameter = integer(7), 
            a = integer(max.kd), xi = double(max.kd), vert = double(2 * 
                D), vval = double((D + 1) * max.kd), diagonal = double(N), 
            trL = double(1), delta1 = double(1), delta2 = double(1), 
            as.integer(0L))

This all relates to robust fitting in Loess (the method). If you don't want/need a robust fit, use family = "gaussian" in your loess.smooth call.

Also, note that the defaults for loess.smooth differ from those of loess, e.g. for 'span' and 'degree'. So carefully check out what models you want to fit and adjust the relevant function's defaults.

For part 2:

DF <- data.frame(x = 0:50, y = rep(0,51))
mod <- loess(y ~ x, data = DF)
pred <- predict(mod, newdata = data.frame(x = c(-1, 10, 15, 55)))
mod2 <- loess(y ~ x, data = DF, control = loess.control(surface = "direct"))
pred2 <- predict(mod2, newdata = data.frame(x = c(-1, 10, 15, 55)))

Which gives:

> pred
 1  2  3  4 
NA  0  0 NA 
> pred2
1 2 3 4 
0 0 0 0

The default won't extrapolate if that was what you meant. I don't see what the problem with using predict here is at all, in fact.

For part 3: Look at ?try and ?tryCatch which you can wrap round the loess fitting function (loess.smooth say), which will allow computations to continue if an error in loess.smooth is encountered.

You will need to handle the output of try or tryCatch by including something like (if you are doing this in a loop:

mod <- try(loess.smooth(x, y))
if(inherits(mod, "try-error"))
    next
## if here, model work, do something with `mod`

I would probably combine try or tryCatch with fitting via loess and using predict for such a problem.

like image 85
Gavin Simpson Avatar answered Sep 28 '22 09:09

Gavin Simpson