The recent addition of direct support for parallel computing in R2.14 sparked a question in my mind. There are numerous options for creating clusters in R. I use snow
SOCK clusters on a regular basis, but I know that there are other ways such as MPI. I use SOCK snow
clusters because I do not need to install any additional software (I use Fedora 13).
So, my concrete questions:
Definition: A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource. [ 1] A cluster is usually a linux-based operating system.
1) there is a limited number of benchmarks available which proof that MPI will be faster than SOCKets. But as an R user you probably will not care about these differences. They are in the area of milli seconds and the number of communications is not that high in embarrassingly parallel problems
2) Yes, you do not have to provide a list of machine names or IPs. For a computer cluster with 100 nodes this gets complicated. But everything depends on your computer cluster. In most cases MPI or PVM is already preinstalled and everything works out of the box using Rmpi, ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With