Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

An error in one job contaminates others with mclapply

Tags:

fork

r

mclapply

When mclapply(X, FUN) encounters errors for some of the values of X, the errors propagate to some (but not all) of the other values of X:

require(parallel)
test <- function(x) if(x == 3) stop() else x
mclapply(1:3, test, mc.cores = 2)

#[[1]]
#[1] "Error in FUN(c(1L, 3L)[[2L]], ...[cut]
#
#[[2]]
#[1] 2
#
#[[3]]
#[1] "Error in FUN(c(1L, 3L)[[2L]], ... [cut]

#Warning message:
#In mclapply(1:3, test, mc.cores = 2) :
#  scheduled core 1 encountered error in user code, all values of the job will be affected

How can I stop this happening?

like image 698
orizon Avatar asked Aug 20 '13 08:08

orizon


1 Answers

The trick is to set mc.preschedule = FALSE

mclapply(1:3, test, mc.cores = 2, mc.preschedule = FALSE)
#[[1]]
#[1] 1

#[[2]]
#[1] 2

#[[3]]
#[1] "Error in FUN(X[[nexti]], ...[cut]
#Warning message:
#In mclapply(1:3, test, mc.cores = 2, mc.preschedule = FALSE) :
#  1 function calls resulted in an error

This works because by default mclapply seems to divide X into mc.cores groups and applies a vectorized version of FUN to each group. As a result if any member of the group yields an error, all values in that group will yield the same error (but values in other groups are unaffected).

Setting mc.preschedule = FALSE has adverse effects and may make it impossible to reproduce a sequence of pseudo-random numbers where the same job always receives the same number in the sequence, see ?mcparallel under the heading Random numbers.

like image 58
orizon Avatar answered Sep 27 '22 16:09

orizon