What is the difference between cluster and cores in <code>registerDoParallel</code> when using doParallel package? Is my understanding correct that on single machine these are interchangeable and I will get same results for : <pre class="prettyprint"><code>cl <- makeCluster(4) registerDoParallel(cl) </code></pre> and <pre class="prettyprint"><code>registerDoParallel(cores = 4) </code></pre> The only difference I see that <code>makeCluster()</code> has to be stopped explicitly using <code>stopCluster()</code>.

I think the chosen answer is too general and actually not accurate, since it didn't touch the detail of <code>doParallel</code> package itself. If you read the vignettes, it's actually pretty clear. <blockquote> The parallel package is essentially a merger of the multicore package, which was written by Simon Urbanek, and the snow package, which was written by Luke Tierney and others. The multicore functionality supports multiple workers only on those operating systems that support the fork system call; this excludes Windows. By default, doParallel uses multicore functionality on Unix-like systems and snow functionality on Windows. We will use snow-like functionality in this vignette, so we start by loading the package and starting a cluster To use multicore-like functionality, we would specify the number of cores to use instead </blockquote> In summary, this is system dependent. Cluster is the more general mode cover all platforms, and cores is only for unix-like system. To make the interface consistent, the package used same function for these two modes. <pre class="prettyprint"><code>> library(doParallel) > cl <- makeCluster(4) > registerDoParallel(cl) > getDoParName() [1] "doParallelSNOW" > registerDoParallel(cores=4) > getDoParName() [1] "doParallelMC" </code></pre>

doParallel, cluster vs cores

Tags:

r

doparallel

rparallel

What is the difference between cluster and cores in registerDoParallel when using doParallel package?

Is my understanding correct that on single machine these are interchangeable and I will get same results for :

cl <- makeCluster(4)
registerDoParallel(cl)

and

registerDoParallel(cores = 4)

The only difference I see that makeCluster() has to be stopped explicitly using stopCluster().

277

asked Mar 03 '15 10:03

Tomas Greif

2 Answers

I think the chosen answer is too general and actually not accurate, since it didn't touch the detail of doParallel package itself. If you read the vignettes, it's actually pretty clear.

The parallel package is essentially a merger of the multicore package, which was written by Simon Urbanek, and the snow package, which was written by Luke Tierney and others. The multicore functionality supports multiple workers only on those operating systems that support the fork system call; this excludes Windows. By default, doParallel uses multicore functionality on Unix-like systems and snow functionality on Windows.

We will use snow-like functionality in this vignette, so we start by loading the package and starting a cluster

To use multicore-like functionality, we would specify the number of cores to use instead

In summary, this is system dependent. Cluster is the more general mode cover all platforms, and cores is only for unix-like system.

To make the interface consistent, the package used same function for these two modes.

> library(doParallel)
> cl <- makeCluster(4)
> registerDoParallel(cl)
> getDoParName()
[1] "doParallelSNOW"

> registerDoParallel(cores=4)
> getDoParName()
[1] "doParallelMC"

152

answered Oct 16 '22 11:10

dracodoc

Yes, it's right from the software view.

on single machine these are interchangeable and I will get same results.

To understand 'cluster' and 'cores' clearly, I suggest thinking from the 'hardware' and 'software' level.

At the hardware level, 'cluster' means network connected machines that can work together by communications such as by socket (Need more init/stop operations as stopCluster you pointed). While 'cores' means several hardware cores in local CPU, and they work together by shared memory typically (don't need to send message explicitly from A to B).

At the software level, sometimes, the boundary of cluster and cores is not that clear. The program can be run locally by cores or remote by cluster, and the high-level software doesn't need to know the details. So, we can mix two modes such as using explicit communication in local as setting cl in one machine, and also can run multicores in each of the remote machines.

Back to your question, is setting cl or cores equal?

From the software, it will be the same that the program will be run by the same number of clients/servers and then get the same results.

From the hardware, it may be different. cl means to communicate explicitly and cores to shared memory, but if the high-level software optimized very well. In a local machine, both setting will goes into the same flow. I don't look into doParallel very deep now, so I am not very sure if these two are the same.

But in practice, it is better to specify cores for a single machine and cl for the cluster.

Hope this helps you.

answered Oct 16 '22 11:10

Patric

Related questions
                            
                                Emoticons in Twitter Sentiment Analysis in r
                            
                                Is there a quick way to get the R equivalent of ls() in Python?
                            
                                export data frames to Excel via xlsx with conditional formatting
                            
                                How to "unmelt" data with reshape r
                            
                                Downloading png from Shiny (R)
                            
                                Associate a color palette with ggplot2 theme
                            
                                Filter dataframe using global variable with the same name as column name [duplicate]
                            
                                Horizontal Rule hr() in R Shiny Sidebar
                            
                                R + plotly: solid of revolution
                            
                                In ESS/Emacs, how can I get the R process buffer to scroll to the bottom after a C-c C-j or C-c C-r
                            
                                Exceeding memory limit in R (even with 24GB RAM)
                            
                                could not find function "cast" despite reshape2 installed and loaded
                            
                                Abbreviation of "collapse" in paste?
                            
                                read.table reads "T" as TRUE and "F" as FALSE, how to avoid?
                            
                                subsetting a data.table using !=<some non-NA> excludes NA too
                            
                                Should I get a habit of removing unused variables in R?
                            
                                Filter data.table using inequalities and variable column names
                            
                                how to determine if a character vector is a valid numeric or integer vector
                            
                                RStudio empty on startup - No windows, no menus, no rendering
                            
                                Extracting orthogonal polynomial coefficients from R's poly() function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With