Question
I've noticed that foreach/%dopar% performs sequential, not parallel setup of a cluster prior to executing tasks in parallel. If each worker requires a dataset and it takes N seconds to transfer the dataset to the worker, then foreach/%dopar% spends #workers * N seconds
of setup time. This can be significant for large # of workers or a large N (large datasets to transfer).
My question is whether this is by design or is there some parameter/setting that I'm missing in foreach or perhaps in cluster generation?
Setup
Example
library( foreach )
library( parallel )
library( doParallel )
# lots of data
data = eval( rnorm( 100000000 ) )
# make cluster/register - creates 6 nodes fairly quickly
cluster = makePSOCKcluster( 6 , outfile = "" )
registerDoParallel( cluster )
# fire up Task Manager. Observer that each node recieves data sequentially.
# When last node gets data, then all nodes process at the same time
results = foreach( i = 1 : 500 ) %dopar%
{
print( data[ i ] )
return( data[ i ] )
}
Note that by default the makeCluster makeCluster function creates a PSOCK PSOCK cluster, which is an enhanced version of the SOCK SOCK cluster implemented in the snow snow package. Accordingly, the PSOCK PSOCK cluster is a pool of worker processes that exchange data with the master process via sockets.
When the HPC environment ## does not support SSH between compute nodes, one can use the 'pjrsh' ## command to launch the parallel workers. cl <- makeClusterPSOCK ( availableWorkers (), rshcmd = "pjrsh", dryrun = TRUE, quiet = TRUE )
Workers will be started sequentially on other clusters, on all clusters with setup_strategy = "sequential"and on R3.6.0 and older. This option is for expert use only (e.g. debugging) and may be removed in future versions of R.
The resulting object must be a two-column matrix with the first column representing means, and the second column describing variances (the number of rows must be equal to the number of files). Repeat the actions listed in Exercise 8 to prepare a cluster for parallel execution, then run the modified code in parallel.
Thanks to Rich at Revolution Computing for helping with this one....
clusterCall
uses a for loop to send data to each worker. Because R is not multi-threaded the for loop must be sequential.
There are a few solutions (which would require someone to code them up). R could call out to C/C++ to thread the worker setup. Or the workers could pull the data from a file on disk. Or the workers could listen on the same socket and the master could write to the socket just once and have the data broadcast to all workers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With