Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bailing out from error in large `sapply`

This question may or may not be inspired by my losing an entire 3-hour geocoding run because one of the values returned an error. Cue the pity (down)votes.

Basically there was an error returned inside a function called by sapply. I had options(error=recover) on, but despite browsing through every level available to me, I could not find any place where the results of the (thousands of successful) calls to FUN were stored in memory.

Some of the objects I found while browsing around themselves gave errors when I attempted to examine them, claiming the references were no longer valid. Unfortunately I lost the particular error message.

Here's a quick example which, while it does not replicate the reference error (which I suspect is related to disappearing environments and is probably immaterial), does demonstrate that I cannot see a way to save the data that has already been processed.

Is there such a technique?

Note that I have since realized my error and inserted even more robust error handling than existed before via try, but I am looking for a way to recover the contents ex post rather than ex ante.

Test function

sapply( seq(10), function(x) {
  if(x==5) stop("Error!")
  return( "important data" )
} )

Interactive exploration

> sapply( seq(10), function(x) {
+   if(x==5) stop("Error!")
+   return( "important data" )
+ } )
Error in FUN(1:10[[5L]], ...) : Error!

Enter a frame number, or 0 to exit   

1: sapply(seq(10), function(x) {
    if (x == 5) 
        stop("Error!")
    return("important data")
})
2: lapply(X = X, FUN = FUN, ...)
3: FUN(1:10[[5]], ...)

Selection: 3
Called from: FUN(1:10[[5L]], ...)
Browse[1]> ls()
[1] "x"
Browse[1]> x
[1] 5
Browse[1]> 
Enter a frame number, or 0 to exit   

1: sapply(seq(10), function(x) {
    if (x == 5) 
        stop("Error!")
    return("important data")
})
2: lapply(X = X, FUN = FUN, ...)
3: FUN(1:10[[5]], ...)

Selection: 2
Called from: lapply(X = X, FUN = FUN, ...)
Browse[1]> ls()
[1] "FUN" "X"  
Browse[1]> X
 [1]  1  2  3  4  5  6  7  8  9 10
Browse[1]> FUN
function(x) {
  if(x==5) stop("Error!")
  return( "important data" )
}
Browse[1]> 
Enter a frame number, or 0 to exit   

1: sapply(seq(10), function(x) {
    if (x == 5) 
        stop("Error!")
    return("important data")
})
2: lapply(X = X, FUN = FUN, ...)
3: FUN(1:10[[5]], ...)

Selection: 1
Called from: sapply(seq(10), function(x) {
    if (x == 5) 
        stop("Error!")
    return("important data")
})
Browse[1]> ls()
[1] "FUN"       "simplify"  "USE.NAMES" "X"        
Browse[1]> X
 [1]  1  2  3  4  5  6  7  8  9 10
Browse[1]> USE.NAMES
[1] TRUE
Browse[1]> simplify
[1] TRUE
Browse[1]> FUN
function(x) {
  if(x==5) stop("Error!")
  return( "important data" )
}
Browser[1]> Q

To be clear, what I was hoping to find was the vector:

[1] "important data" "important data" "important data" "important data"

In other words, the results of the internal loop that had been completed to this point.

Edit: Update with C code

Inside .Internal(lapply()) is the following code:

PROTECT(ans = allocVector(VECSXP, n));
...
for(i = 0; i < n; i++) {
   ...
   tmp = eval(R_fcall, rho);
   ...
   SET_VECTOR_ELT(ans, i, tmp);
}

I want to get at ans when any call to lapply fails.

like image 787
Ari B. Friedman Avatar asked Oct 22 '12 01:10

Ari B. Friedman


3 Answers

I'm struggling to see why a try() here isn't the way to go? If the sapply() fails for whatever reason then you

  1. want to handle that failure well
  2. carry on from there

Why would you want the entire data analysis/processing step to stop just for an error? Which is what you seem to be proposing. Rather than try to recover what has already been done, write your code so that it just carries on, recording the error took place but also gracefully moving onto the next step in the process.

It is a bit convoluted because the example you give is contrived (if you knew what would cause an error you could handle that without a try()), but bear with me:

foo <- function(x) {
    res <- try({
        if(x==5) {
            stop("Error!")
        } else {
            "important data"
        }
    })
    if(inherits(res, "try-error"))
        res <- "error occurred"
    res
}

> sapply( seq(10), foo)
Error in try({ : Error!
 [1] "important data" "important data" "important data" "important data"
 [5] "error occurred" "important data" "important data" "important data"
 [9] "important data" "important data"

Having runs jobs that took weeks to finish on my workstation in the background, I quickly learned to write lots of try() calls around individual statements rather than big blocks of code so that once an error occurred I could quickly get out of that iteration/step with the least effect on the running job; in other words, if a particular R call failed I returned something that would slot into the object returned by sapply() (or whatever function) nicely.

For anything more complex, I would probably use lapply():

foo2 <- function(x) {
    res <- try({
        if(x==5) {
            stop("Error!")
        } else {
            lm(rnorm(10) ~ runif(10))
        }
    })
    if(inherits(res, "try-error"))
        res <- "error occurred"
    res
}

out <- lapply(seq(10), foo2)
str(out, max = 1)

because you are going to want the list rather than try to simplify more complex objects down to something simple:

>     out <- lapply(seq(10), foo2)
Error in try({ : Error!
> str(out, max = 1)
List of 10
 $ :List of 12
  ..- attr(*, "class")= chr "lm"
 $ :List of 12
  ..- attr(*, "class")= chr "lm"
 $ :List of 12
  ..- attr(*, "class")= chr "lm"
 $ :List of 12
  ..- attr(*, "class")= chr "lm"
 $ : chr "error occurred"
 $ :List of 12
  ..- attr(*, "class")= chr "lm"
 $ :List of 12
  ..- attr(*, "class")= chr "lm"
 $ :List of 12
  ..- attr(*, "class")= chr "lm"
 $ :List of 12
  ..- attr(*, "class")= chr "lm"
 $ :List of 12
  ..- attr(*, "class")= chr "lm"

That said, I'd probably have done this via a for() loop, filling in a preallocated list as I iterated.

like image 134
Gavin Simpson Avatar answered Sep 30 '22 00:09

Gavin Simpson


You never assigned the intermediate values to anything. I don't understand why you think there should be any entrails to divine. You need to record the values somehow:

 res <- sapply( seq(10), function(x) { z <- x
                                   on.exit(res <<- x);
                                   if(x==5) stop("Error!")
 } )
Error in FUN(1:10[[5L]], ...) : Error!
 res
#[1] 5

This on.exit method is illustrated on the ?par page as a way of restoring par settings when plotting has gone wrong. (I was not able to get it to work with on.exit(res <- x).

like image 40
IRTFM Avatar answered Sep 29 '22 22:09

IRTFM


Maybe I'm not understanding and this will certainly slow you down but what about a global assignment each time?

safety <- vector()
sapply( seq(10), function(x) {
  if(x==5) stop("Error!")
  assign('safety', c(safety, x), envir = .GlobalEnv)
  return( "important data" )
} )

Yields:

> safety <- vector()
> sapply( seq(10), function(x) {
+   if(x==5) stop("Error!")
+   assign('safety', c(safety, x), envir = .GlobalEnv)
+   return( "important data" )
+ } )
Error in FUN(1:10[[5L]], ...) : Error!
> safety
[1] 1 2 3 4
like image 30
Tyler Rinker Avatar answered Sep 30 '22 00:09

Tyler Rinker