Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TryCatch with parLapply (Parallel package) in R

I am trying to run something on a very large dataset. Basically, I want to loop through all files in a folder and run the function fromJSON on it. However, I want it to skip over files that produce an error. I have built a function using tryCatch however, that only works when i use the function lappy and not parLapply.

Here is my code for my exception handling function:

readJson <- function (file) {
 require(jsonlite)
 dat <- tryCatch(
        {
         fromJSON(file, flatten=TRUE)      
        },
         error = function(cond) {
                 message(cond)
                 return(NA)
        },
         warning = function(cond) {
                  message(cond)
                  return(NULL)
                  }
   )
  return(dat)   
}

and then I call parLapply on a character vector files which contains the full paths to the JSON files:

 dat<- parLapply(cl,files,readJson)

that produces an error when it reaches a file that doesn't end properly and does not create the list 'dat' by skipping over the problematic file. Which is what the readJson function was supposed to mitigate.

When I use regular lapply, however it works perfectly fine. It generates the errors, however, it still creates the list by skipping over the erroneous file.

any ideas on how I could use exception handling with parLappy parallel such that it will skip over the problematic files and generate the list?

like image 652
user2905393 Avatar asked Aug 19 '17 16:08

user2905393


1 Answers

In your error handler function cond is an error condition. message(cond) signals this condition, which is caught on the workers and transmitted as an error to the master. Either remove the message calls or replace them with something like message(conditionMessage(cond)) You won't see anything on the master though, so removing is probably best.

like image 116
Luke Tierney Avatar answered Oct 10 '22 18:10

Luke Tierney