Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is mclapply guaranteed to return its results in order?

I'm working with mclapply from the multicore package (on Ubuntu), and I'm writing a function that required that the results of mclapply(x, f) are returned in order (that is, f(x[1]), f(x[2]), ...., f(x[n])).

# multicore doesn't work on Windows

require(multicore)
unlist(mclapply(
    1:10,
    function(x){
        Sys.sleep(sample(1:5, size = 1))
        identity(x)}, mc.cores = 2))

[1] 1 2 3 4 5 6 7 8 9 10

The above code seems to imply that mclapply returns results in the same order as lapply.

However, if this assumption is wrong I'll have to spend a long time refactoring my code, so I'm hoping to get assurance from someone more familiar with this package/parallel computing that this assumption is correct.

Is it safe to assume that mclapply always returns its results in order, regardless of the optional arguments it is given?

like image 797
Róisín Grannell Avatar asked Feb 04 '13 23:02

Róisín Grannell


1 Answers

Short answer: it does return the results in the correct order.

But of course, you should read the code yourself (mclapply is an R function...)

The man page for collect gives some more hints:

Note: If expr uses low-level multicore functions such as sendMaster a single job can deliver results multiple times and it is the responsibility of the user to interpret them correctly.

However, if you don't mess with low-level,

collect returns any results that are available in a list. The results will have the same order as the specified jobs. If there are multiple jobs and a job has a name it will be used to name the result, otherwise its process ID will be used.

(my emphasis)

Now for mclapply. A quick glanc over the source code yields:

  • if !mc.preschedule and there are no more jobs than cores (length (X) <= cores) parallel and collect are used, see above.
  • if mc.preschedule or more jobs than cores, mclapply itself takes care of the order - see the code.

However, here's a slightly modified version of your experiment:

> unlist (mclapply(1:10, function(x){
    Sys.sleep(sample(1:5, size = 1)); 
    cat (x, " ");    
    identity(x)}, 
  mc.cores = 2, mc.preschedule = FALSE))
1  2  4  3  6  5  7  8  9  10   [1]  1  2  3  4  5  6  7  8  9 10
> unlist (mclapply(1:10, function(x){
    Sys.sleep(sample(1:5, size = 1)); 
    cat (x, " ");    
    identity(x)}, 
  mc.cores = 2, mc.preschedule = TRUE))
1  3  2  5  4  6  7  8  10  9   [1]  1  2  3  4  5  6  7  8  9 10

Which shows that the results are returned in different order by the child jobs (more precisely: child jobs are about to finish in different order), but the result is assembled in the original order.

(works on the console, but not in RStudio - the cats do not show up there)

like image 156
cbeleites unhappy with SX Avatar answered Sep 18 '22 12:09

cbeleites unhappy with SX