Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an efficient way to parallelize mapply?

I have many rows and on every row I compute the uniroot of a non-linear function. I have a quad-core Ubuntu machine which hasn't stopped running my code for two days now. Not surprisingly, I'm looking for ways to speed things up ;-)

After some research, I noticed that only one core is currently used and parallelization is the thing to do. Digging deeper, I came to the conclusion (maybe incorrectly?) that the package foreach isn't really meant for my problem because too much overhead is produced (see, for example, SO). A good alternative seems to be multicore for Unix machines. In particular, the pvec function seems to be the most efficient one after I checked the help page.

However, if I understand it correctly, this function only takes one vector and splits it up accordingly. I need a function that can be parallized, but takes multiple vectors (or a data.frame instead), just like the mapply function does. Is there anything out there that I missed?

Here is a small example of what I want to do: (Note that I include a plyr example here because it can be an alternative to the base mapply function and it has a parallelize option. However, it is slower in my implementation and internally, it calls foreach to parallelize, so I think it won't help. Is that correct?)

library(plyr)
library(foreach)
n <- 10000
df <- data.frame(P   = rnorm(n, mean=100, sd=10),
                 B0  = rnorm(n, mean=40,  sd=5),
                 CF1 = rnorm(n, mean=30,  sd=10),
                 CF2 = rnorm(n, mean=30,  sd=5),
                 CF3 = rnorm(n, mean=90,  sd=8))

get_uniroot <- function(P, B0, CF1, CF2, CF3) {

  uniroot(function(x) {-P + B0 + CF1/x + CF2/x^2 + CF3/x^3}, 
          lower = 1,
          upper = 10,
          tol   = 0.00001)$root

}

system.time(x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3))
   #user  system elapsed 
   #0.91    0.00    0.90 
system.time(x2 <- mdply(df, get_uniroot))
   #user  system elapsed 
   #5.85    0.00    5.85
system.time(x3 <- foreach(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3, .combine = "c") %do% {
    get_uniroot(P, B0, CF1, CF2, CF3)})
   #user  system elapsed 
  # 10.30    0.00   10.36
all.equal(x1, x2$V1) #TRUE
all.equal(x1, x3)    #TRUE

Also, I tried to implement Ryan Thompson's function chunkapply from the SO link above (only got rid of doMC part, because I couldn't install it. His example works, though, even after adjusting his function.), but didn't get it to work. However, since it uses foreach, I thought the same arguments mentioned above apply, so I didn't try it too long.

#chunkapply(get_uniroot, list(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3))
#Error in { : task 1 failed - "invalid function value in 'zeroin'"

PS: I know that I could just increase tol to reduce the number of steps that are necessary to find a uniroot. However, I already set tol as big as possible.

like image 429
Christoph_J Avatar asked Jan 11 '12 22:01

Christoph_J


People also ask

Does Lapply use multiple cores?

3.2. The "mc" stands for "multicore," and as you might gather, this function distributes the lapply tasks across multiple CPU cores to be executed in parallel. This is the first cut at parallelizing R scripts.

Is Lapply parallel?

mclapply is a parallelized version of lapply , it returns a list of the same length as X , each element of which is the result of applying FUN to the corresponding element of X . It relies on forking and hence is not available on Windows unless mc. cores = 1 .

How do you Parallelise R?

There are various packages in R which allow parallelization. “parallel” Package The parallel package in R can perform tasks in parallel by providing the ability to allocate cores to R. The working involves finding the number of cores in the system and allocating all of them or a subset to make a cluster.

How do I run multiple cores in R?

If you are on a single host, a very effective way to make use of these extra cores is to use several R instances at the same time. The operating system will indeed always assign a different core to each new R instance. In Linux, just open several the terminal windows. Then within each terminal, type R to open R.


3 Answers

I'd use the parallel package that's built into R 2.14 and work with matrices. You could then simply use mclapply like this:

dfm <- as.matrix(df)
result <- mclapply(seq_len(nrow(dfm)),
          function(x) do.call(get_uniroot,as.list(dfm[x,])),
          mc.cores=4L
          )
unlist(result)

This is basically doing the same mapply does, but in a parallel way.

But...

Mind you that parallelization always counts for some overhead as well. As I explained in the question you link to, going parallel only pays off if your inner function calculates significantly longer than the overhead involved. In your case, your uniroot function works pretty fast. You might then consider to cut your data frame in bigger chunks, and combine both mapply and mclapply. A possible way to do this is:

ncores <- 4
id <- floor(
        quantile(0:nrow(df),
                 1-(0:ncores)/ncores
        )
      )
idm <- embed(id,2)

mapply_uniroot <- function(id){
  tmp <- df[(id[1]+1):id[2],]
  mapply(get_uniroot, tmp$P, tmp$B0, tmp$CF1, tmp$CF2, tmp$CF3)
}
result <-mclapply(nrow(idm):1,
                  function(x) mapply_uniroot(idm[x,]),
                  mc.cores=ncores)
final <- unlist(result)

This might need some tweaking, but it essentially breaks your df in exactly as many bits as there are cores, and run the mapply on every core. To show this works :

> x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3)
> all.equal(final,x1)
[1] TRUE
like image 150
Joris Meys Avatar answered Oct 18 '22 10:10

Joris Meys


it's an old topic but fyi you now have parallel::mcmapply doc is here. don't forget to set mc.cores in the options. I usually use mc.cores=parallel::detectCores()-1 to let one cpu free for OS operations.

x4 <- mcmapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3,mc.cores=parallel::detectCores()-1)

like image 36
zipp Avatar answered Oct 18 '22 11:10

zipp


This isn't exactly a best practices suggestion, but considerable speed-up can be had by identifying the root for all parameters in a 'vectorized' fashion. For instance,

bisect <-
    function(f, interval, ..., lower=min(interval), upper=max(interval),
             f.lower=f(lower, ...), f.upper=f(upper, ...), maxiter=20)
{
    nrow <- length(f.lower)
    bounds <- matrix(c(lower, upper), nrow, 2, byrow=TRUE)
    for (i in seq_len(maxiter)) {
        ## move lower or upper bound to mid-point, preserving opposite signs
        mid <- rowSums(bounds) / 2
        updt <- ifelse(f(mid, ...) > 0, 0L, nrow) + seq_len(nrow)
        bounds[updt] <- mid
    }
    rowSums(bounds) / 2
}

and then

> system.time(x2 <- with(df, {
+     f <- function(x, PB0, CF1, CF2, CF3)
+         PB0 + CF1/x + CF2/x^2 + CF3/x^3
+     bisect(f, c(1, 10), PB0, CF1, CF2, CF3)
+ }))
   user  system elapsed 
  0.180   0.000   0.181 
> range(x1 - x2)
[1] -6.282406e-06  6.658593e-06

versus about 1.3s for application of uniroot separately to each. This also combined P and B0 into a single value ahead of time, since that is how they enter the equation.

The bounds on the final value are +/- diff(interval) * (.5 ^ maxiter) or so. A fancier implementation would replace bisection with linear or quadratic interpolation (as in the reference cited in ?uniroot), but then uniform efficient convergence (and in all cases error handling) would be more tricky to arrange.

like image 39
Martin Morgan Avatar answered Oct 18 '22 10:10

Martin Morgan