I'm working with mclapply
from the multicore
package (on Ubuntu), and I'm writing a function that required that the results of mclapply(x, f)
are returned in order (that is, f(x[1]), f(x[2]), ...., f(x[n])
).
# multicore doesn't work on Windows
require(multicore)
unlist(mclapply(
1:10,
function(x){
Sys.sleep(sample(1:5, size = 1))
identity(x)}, mc.cores = 2))
[1] 1 2 3 4 5 6 7 8 9 10
The above code seems to imply that mclapply
returns results in the same order as lapply
.
However, if this assumption is wrong I'll have to spend a long time refactoring my code, so I'm hoping to get assurance from someone more familiar with this package/parallel computing that this assumption is correct.
Is it safe to assume that mclapply
always returns its results in order, regardless of the optional arguments it is given?
Short answer: it does return the results in the correct order.
But of course, you should read the code yourself (mclapply
is an R function...)
The man page for collect
gives some more hints:
Note: If expr uses low-level multicore functions such as sendMaster a single job can deliver results multiple times and it is the responsibility of the user to interpret them correctly.
However, if you don't mess with low-level,
collect returns any results that are available in a list. The results will have the same order as the specified jobs. If there are multiple jobs and a job has a name it will be used to name the result, otherwise its process ID will be used.
(my emphasis)
Now for mclapply
.
A quick glanc over the source code yields:
!mc.preschedule
and there are no more jobs than cores (length (X) <= cores
) parallel
and collect
are used, see above. mc.preschedule
or more jobs than cores, mclapply
itself takes care of the order - see the code.However, here's a slightly modified version of your experiment:
> unlist (mclapply(1:10, function(x){
Sys.sleep(sample(1:5, size = 1));
cat (x, " ");
identity(x)},
mc.cores = 2, mc.preschedule = FALSE))
1 2 4 3 6 5 7 8 9 10 [1] 1 2 3 4 5 6 7 8 9 10
> unlist (mclapply(1:10, function(x){
Sys.sleep(sample(1:5, size = 1));
cat (x, " ");
identity(x)},
mc.cores = 2, mc.preschedule = TRUE))
1 3 2 5 4 6 7 8 10 9 [1] 1 2 3 4 5 6 7 8 9 10
Which shows that the results are returned in different order by the child jobs (more precisely: child jobs are about to finish in different order), but the result is assembled in the original order.
(works on the console, but not in RStudio - the cat
s do not show up there)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With