Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

doMC vs doSNOW vs doSMP vs doMPI: why aren't the various parallel backends for 'foreach' functionally equivalent?

I've got a few test pieces of code that I've been running on various machines, always with the same results. I thought the philosophy behind the various do... packages was that they could be used interchangeably as a backend for foreach's %dopar%. Why is this not the case?

For example, this code snippet works:

library(plyr)
library(doMC)
registerDoMC()
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE)

While each of these code snippets fail:

library(plyr)
library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) 
stopWorkers(workers)

library(plyr)
library(snow)
library(doSNOW)
cl <- makeCluster(2, type = "SOCK")
registerDoSNOW(cl)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) 
stopCluster(cl)

library(plyr)
library(doMPI)
cl <- startMPIcluster(count = 2)
registerDoMPI(cl)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) 
closeCluster(cl)

In all four cases, foreach(i = 1:3,.combine = "c") %dopar% {sqrt(i)} yields the exact same result, so I know I have the packages installed and working properly on each machine I've tested them on.

What is doMC doing differently from doSMP, doSNOW, and doMPI?

like image 947
Zach Avatar asked Apr 07 '11 23:04

Zach


1 Answers

doMC forks the current R process so it inherits all the existing variables. All the other do backends only pass on explicitly requested variables. Unfortunately I didn't realise that, and only tested with doMC - this is something I hope to fix in the next version of plyr.

like image 105
hadley Avatar answered Oct 04 '22 03:10

hadley