foreach %dopar% uses sequential worker setup with PSock cluster?

Q: How do I start a worker sequentially on other clusters?

Workers will be started sequentially on other clusters, on all clusters with setup_strategy = "sequential"and on R3.6.0 and older. This option is for expert use only (e.g. debugging) and may be removed in future versions of R.

Q: How do I prepare a cluster for parallel execution?

The resulting object must be a two-column matrix with the first column representing means, and the second column describing variances (the number of rows must be equal to the number of files). Repeat the actions listed in Exercise 8 to prepare a cluster for parallel execution, then run the modified code in parallel.

Tags:

r

Question

I've noticed that foreach/%dopar% performs sequential, not parallel setup of a cluster prior to executing tasks in parallel. If each worker requires a dataset and it takes N seconds to transfer the dataset to the worker, then foreach/%dopar% spends #workers * N seconds of setup time. This can be significant for large # of workers or a large N (large datasets to transfer).

My question is whether this is by design or is there some parameter/setting that I'm missing in foreach or perhaps in cluster generation?

Setup

R 2.15.2
latest versions of foreach/parallel/doParallel as of today (1/7/2013)
Windows 7 x64

Example

library( foreach )
library( parallel )
library( doParallel )

# lots of data
data = eval( rnorm( 100000000 ) )

# make cluster/register - creates 6 nodes fairly quickly
cluster = makePSOCKcluster( 6 , outfile = "" )
registerDoParallel( cluster  )

# fire up Task Manager.  Observer that each node recieves data sequentially.
# When last node gets data, then all nodes process at the same time
results = foreach( i = 1 : 500 )  %dopar%
{
    print( data[ i ] )
    return( data[ i ] )
}

547

asked Jan 07 '13 16:01

SFun28

1 Answers

Thanks to Rich at Revolution Computing for helping with this one....

clusterCall uses a for loop to send data to each worker. Because R is not multi-threaded the for loop must be sequential.

There are a few solutions (which would require someone to code them up). R could call out to C/C++ to thread the worker setup. Or the workers could pull the data from a file on disk. Or the workers could listen on the same socket and the master could write to the socket just once and have the data broadcast to all workers.

answered Oct 18 '22 16:10

SFun28

Related questions
                            
                                Is it possible to access R Leaflet layer controls in Shiny (outside of leaflet)?
                            
                                Prevent navigation bar from overlapping content in flexdashboard R
                            
                                is there a way to release the memory when do parallel computation in R
                            
                                python KDE get contours and paths into specific json format leaflet-friendly
                            
                                Fitting the same models in nlme and lme4
                            
                                R quo_name equivalent of quos
                            
                                Function for Impulse Response Function
                            
                                Replacement for parallel plyr with doMC
                            
                                Libreoffice gives "Application Error" when called from R
                            
                                Add labels to the center of a geom_curve line (ggplot)
                            
                                Remove duplicated labels in legend of ggplot
                            
                                Edit datatable in Shiny with dropdown selection for factor variables
                            
                                Getting the decision boundary for KNN classifier using R
                            
                                Is `for` a function in R?
                            
                                R: How to elegantly separate code logic from UI / html-tags?
                            
                                Join two coordinates of a matrix with minimum distance
                            
                                Suddenly ggplot Crashes R studio, Any suggestions?
                            
                                R Packages for Limnology [closed]
                            
                                Challenge: Duplicating Many Eyes Word Tree with R
                            
                                reordering geom_bar when using facet_wrap

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

foreach %dopar% uses sequential worker setup with PSock cluster?

Tags:

r

SFun28

People also ask

1 Answers

SFun28

Recent Activity

Donate For Us