Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom package using parallel or doParallel for multiple OS as a CRAN package

I am building a package for R which I want to be able to be cross-platform. I am developing under Linux, and the function mclapply will be used from the parallel package. This package is not supported for Windows (which uses doParallel). I really like the parallel package though for it's simplicity and speed, and I do not know if this should be a reason to have 2 different versions available of the package for CRAN, for the separate OS (seems like extra work to maintain), not to mention if it is even allowed.

Thoughts?

Also, for now I am regarding parallel's

mclapply(ldata, function(x), mc.cores=cores)

to be equivalent of doParallel's

cl <- makeCluster(cores)
parLapply(cl, ldata, function(x))

Is that correct?

like image 321
PascalVKooten Avatar asked Sep 03 '13 09:09

PascalVKooten


1 Answers

First, both mclapply and parLapply are in the parallel package, although mclapply doesn't actually run in parallel on Windows. parLapply runs in parallel on all supported platforms, but isn't always as efficient as mclapply. The doParallel package is used with the foreach package, and acts as an adapter to the parallel package.

To write a package that works on both Windows and non-Windows, you have a variety of reasonable options:

  • Just use parLapply since it works everywhere
  • Use parLapply on Windows and mclapply elsewhere
  • Use doParallel with foreach

The doParallel package is convenient because it makes use of mclapply on non-Windows platforms. For example:

library(doParallel)
registerDoParallel()
foreach(i=1:10, .options.snow=list(preschedule=TRUE)) %dopar% {
    Sys.sleep(2)
}

This uses mclapply on Linux and Mac OS X, but will automatically create a PSOCK cluster object behind the scenes on Windows. The use of preschedule=TRUE (added in doParallel 1.0.3) will cause doParallel to preschedule the tasks using clusterApply internally, much like parLapply.

Note that if you explicitly create and register a cluster object, then mclapply will not be used, regardless of the platform. It will work fine, but may not be as efficient. To use mclapply, you must call registerDoParallel with a numeric argument, or no argument at all.

You can look at the source code for the boot package for an example of how to use either mclapply or parLapply depending on your platform.

like image 139
Steve Weston Avatar answered Nov 07 '22 00:11

Steve Weston