Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel computing with clusters other than snow SOCK

The recent addition of direct support for parallel computing in R2.14 sparked a question in my mind. There are numerous options for creating clusters in R. I use snow SOCK clusters on a regular basis, but I know that there are other ways such as MPI. I use SOCK snow clusters because I do not need to install any additional software (I use Fedora 13).

So, my concrete questions:

  1. Is there a gain in performance when using non-SOCK clusters?
  2. Is it easier to create clusters on multiple computers using non-SOCK clusters?
like image 544
Paul Hiemstra Avatar asked Dec 07 '11 09:12

Paul Hiemstra


People also ask

What is a cluster parallel computing?

Definition: A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource. [ 1] A cluster is usually a linux-based operating system.


1 Answers

1) there is a limited number of benchmarks available which proof that MPI will be faster than SOCKets. But as an R user you probably will not care about these differences. They are in the area of milli seconds and the number of communications is not that high in embarrassingly parallel problems

2) Yes, you do not have to provide a list of machine names or IPs. For a computer cluster with 100 nodes this gets complicated. But everything depends on your computer cluster. In most cases MPI or PVM is already preinstalled and everything works out of the box using Rmpi, ...

like image 55
Markus Schmidberger Avatar answered Sep 22 '22 22:09

Markus Schmidberger