I have many rows and on every row I compute the uniroot of a non-linear function. I have a quad-core Ubuntu machine which hasn't stopped running my code for two days now. Not surprisingly, I'm looking for ways to speed things up ;-) After some research, I noticed that only one core is currently used and parallelization is the thing to do. Digging deeper, I came to the conclusion (maybe incorrectly?) that the package <code>foreach</code> isn't really meant for my problem because too much overhead is produced (see, for example, SO). A good alternative seems to be <code>multicore</code> for Unix machines. In particular, the <code>pvec</code> function seems to be the most efficient one after I checked the help page. However, if I understand it correctly, this function only takes one vector and splits it up accordingly. I need a function that can be parallized, but takes multiple vectors (or a <code>data.frame</code> instead), just like the <code>mapply</code> function does. Is there anything out there that I missed? Here is a small example of what I want to do: (Note that I include a <code>plyr</code> example here because it can be an alternative to the base <code>mapply</code> function and it has a parallelize option. However, it is slower in my implementation and internally, it calls <code>foreach</code> to parallelize, so I think it won't help. Is that correct?) <pre class="prettyprint"><code>library(plyr) library(foreach) n <- 10000 df <- data.frame(P = rnorm(n, mean=100, sd=10), B0 = rnorm(n, mean=40, sd=5), CF1 = rnorm(n, mean=30, sd=10), CF2 = rnorm(n, mean=30, sd=5), CF3 = rnorm(n, mean=90, sd=8)) get_uniroot <- function(P, B0, CF1, CF2, CF3) { uniroot(function(x) {-P + B0 + CF1/x + CF2/x^2 + CF3/x^3}, lower = 1, upper = 10, tol = 0.00001)$root } system.time(x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3)) #user system elapsed #0.91 0.00 0.90 system.time(x2 <- mdply(df, get_uniroot)) #user system elapsed #5.85 0.00 5.85 system.time(x3 <- foreach(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3, .combine = "c") %do% { get_uniroot(P, B0, CF1, CF2, CF3)}) #user system elapsed # 10.30 0.00 10.36 all.equal(x1, x2$V1) #TRUE all.equal(x1, x3) #TRUE </code></pre> Also, I tried to implement Ryan Thompson's function chunkapply from the SO link above (only got rid of <code>doMC</code> part, because I couldn't install it. His example works, though, even after adjusting his function.), but didn't get it to work. However, since it uses <code>foreach</code>, I thought the same arguments mentioned above apply, so I didn't try it too long. <pre class="prettyprint"><code>#chunkapply(get_uniroot, list(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3)) #Error in { : task 1 failed - "invalid function value in 'zeroin'" </code></pre> PS: I know that I could just increase <code>tol</code> to reduce the number of steps that are necessary to find a uniroot. However, I already set <code>tol</code> as big as possible.

it's an old topic but fyi you now have <code>parallel::mcmapply</code> doc is here. don't forget to set <code>mc.cores</code> in the options. I usually use <code>mc.cores=parallel::detectCores()-1</code> to let one cpu free for OS operations. <code>x4 <- mcmapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3,mc.cores=parallel::detectCores()-1) </code>

This isn't exactly a best practices suggestion, but considerable speed-up can be had by identifying the root for all parameters in a 'vectorized' fashion. For instance, <pre class="prettyprint"><code>bisect <- function(f, interval, ..., lower=min(interval), upper=max(interval), f.lower=f(lower, ...), f.upper=f(upper, ...), maxiter=20) { nrow <- length(f.lower) bounds <- matrix(c(lower, upper), nrow, 2, byrow=TRUE) for (i in seq_len(maxiter)) { ## move lower or upper bound to mid-point, preserving opposite signs mid <- rowSums(bounds) / 2 updt <- ifelse(f(mid, ...) > 0, 0L, nrow) + seq_len(nrow) bounds[updt] <- mid } rowSums(bounds) / 2 } </code></pre> and then <pre class="prettyprint"><code>> system.time(x2 <- with(df, { + f <- function(x, PB0, CF1, CF2, CF3) + PB0 + CF1/x + CF2/x^2 + CF3/x^3 + bisect(f, c(1, 10), PB0, CF1, CF2, CF3) + })) user system elapsed 0.180 0.000 0.181 > range(x1 - x2) [1] -6.282406e-06 6.658593e-06 </code></pre> versus about 1.3s for application of uniroot separately to each. This also combined P and B0 into a single value ahead of time, since that is how they enter the equation. The bounds on the final value are <code>+/- diff(interval) * (.5 ^ maxiter)</code> or so. A fancier implementation would replace bisection with linear or quadratic interpolation (as in the reference cited in <code>?uniroot</code>), but then uniform efficient convergence (and in all cases error handling) would be more tricky to arrange.

Is there an efficient way to parallelize mapply?

Tags:

r

parallel-processing

I have many rows and on every row I compute the uniroot of a non-linear function. I have a quad-core Ubuntu machine which hasn't stopped running my code for two days now. Not surprisingly, I'm looking for ways to speed things up ;-)

After some research, I noticed that only one core is currently used and parallelization is the thing to do. Digging deeper, I came to the conclusion (maybe incorrectly?) that the package foreach isn't really meant for my problem because too much overhead is produced (see, for example, SO). A good alternative seems to be multicore for Unix machines. In particular, the pvec function seems to be the most efficient one after I checked the help page.

However, if I understand it correctly, this function only takes one vector and splits it up accordingly. I need a function that can be parallized, but takes multiple vectors (or a data.frame instead), just like the mapply function does. Is there anything out there that I missed?

Here is a small example of what I want to do: (Note that I include a plyr example here because it can be an alternative to the base mapply function and it has a parallelize option. However, it is slower in my implementation and internally, it calls foreach to parallelize, so I think it won't help. Is that correct?)

library(plyr)
library(foreach)
n <- 10000
df <- data.frame(P   = rnorm(n, mean=100, sd=10),
                 B0  = rnorm(n, mean=40,  sd=5),
                 CF1 = rnorm(n, mean=30,  sd=10),
                 CF2 = rnorm(n, mean=30,  sd=5),
                 CF3 = rnorm(n, mean=90,  sd=8))

get_uniroot <- function(P, B0, CF1, CF2, CF3) {

  uniroot(function(x) {-P + B0 + CF1/x + CF2/x^2 + CF3/x^3}, 
          lower = 1,
          upper = 10,
          tol   = 0.00001)$root

}

system.time(x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3))
   #user  system elapsed 
   #0.91    0.00    0.90 
system.time(x2 <- mdply(df, get_uniroot))
   #user  system elapsed 
   #5.85    0.00    5.85
system.time(x3 <- foreach(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3, .combine = "c") %do% {
    get_uniroot(P, B0, CF1, CF2, CF3)})
   #user  system elapsed 
  # 10.30    0.00   10.36
all.equal(x1, x2$V1) #TRUE
all.equal(x1, x3)    #TRUE

Also, I tried to implement Ryan Thompson's function chunkapply from the SO link above (only got rid of doMC part, because I couldn't install it. His example works, though, even after adjusting his function.), but didn't get it to work. However, since it uses foreach, I thought the same arguments mentioned above apply, so I didn't try it too long.

#chunkapply(get_uniroot, list(P=df$P, B0=df$B0, CF1=df$CF1, CF2=df$CF2, CF3=df$CF3))
#Error in { : task 1 failed - "invalid function value in 'zeroin'"

PS: I know that I could just increase tol to reduce the number of steps that are necessary to find a uniroot. However, I already set tol as big as possible.

429

asked Jan 11 '12 22:01

Christoph_J

3 Answers

I'd use the parallel package that's built into R 2.14 and work with matrices. You could then simply use mclapply like this:

dfm <- as.matrix(df)
result <- mclapply(seq_len(nrow(dfm)),
          function(x) do.call(get_uniroot,as.list(dfm[x,])),
          mc.cores=4L
          )
unlist(result)

This is basically doing the same mapply does, but in a parallel way.

But...

Mind you that parallelization always counts for some overhead as well. As I explained in the question you link to, going parallel only pays off if your inner function calculates significantly longer than the overhead involved. In your case, your uniroot function works pretty fast. You might then consider to cut your data frame in bigger chunks, and combine both mapply and mclapply. A possible way to do this is:

ncores <- 4
id <- floor(
        quantile(0:nrow(df),
                 1-(0:ncores)/ncores
        )
      )
idm <- embed(id,2)

mapply_uniroot <- function(id){
  tmp <- df[(id[1]+1):id[2],]
  mapply(get_uniroot, tmp$P, tmp$B0, tmp$CF1, tmp$CF2, tmp$CF3)
}
result <-mclapply(nrow(idm):1,
                  function(x) mapply_uniroot(idm[x,]),
                  mc.cores=ncores)
final <- unlist(result)

This might need some tweaking, but it essentially breaks your df in exactly as many bits as there are cores, and run the mapply on every core. To show this works :

> x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3)
> all.equal(final,x1)
[1] TRUE

150

answered Oct 18 '22 10:10

Joris Meys

it's an old topic but fyi you now have parallel::mcmapply doc is here. don't forget to set mc.cores in the options. I usually use mc.cores=parallel::detectCores()-1 to let one cpu free for OS operations.

x4 <- mcmapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3,mc.cores=parallel::detectCores()-1)

answered Oct 18 '22 11:10

zipp

This isn't exactly a best practices suggestion, but considerable speed-up can be had by identifying the root for all parameters in a 'vectorized' fashion. For instance,

bisect <-
    function(f, interval, ..., lower=min(interval), upper=max(interval),
             f.lower=f(lower, ...), f.upper=f(upper, ...), maxiter=20)
{
    nrow <- length(f.lower)
    bounds <- matrix(c(lower, upper), nrow, 2, byrow=TRUE)
    for (i in seq_len(maxiter)) {
        ## move lower or upper bound to mid-point, preserving opposite signs
        mid <- rowSums(bounds) / 2
        updt <- ifelse(f(mid, ...) > 0, 0L, nrow) + seq_len(nrow)
        bounds[updt] <- mid
    }
    rowSums(bounds) / 2
}

and then

> system.time(x2 <- with(df, {
+     f <- function(x, PB0, CF1, CF2, CF3)
+         PB0 + CF1/x + CF2/x^2 + CF3/x^3
+     bisect(f, c(1, 10), PB0, CF1, CF2, CF3)
+ }))
   user  system elapsed 
  0.180   0.000   0.181 
> range(x1 - x2)
[1] -6.282406e-06  6.658593e-06

versus about 1.3s for application of uniroot separately to each. This also combined P and B0 into a single value ahead of time, since that is how they enter the equation.

The bounds on the final value are +/- diff(interval) * (.5 ^ maxiter) or so. A fancier implementation would replace bisection with linear or quadratic interpolation (as in the reference cited in ?uniroot), but then uniform efficient convergence (and in all cases error handling) would be more tricky to arrange.

answered Oct 18 '22 10:10

Martin Morgan

Related questions
                            
                                R - ggplot2 time series x-axis to show last day of the month
                            
                                Image output in shiny app
                            
                                Convert an integer to a string in R
                            
                                R Caret Package Error - At least one of the class levels is not a valid R variable name
                            
                                Replacing zeroes with NA for values preceding non-zero
                            
                                R - Fitting a grid over a City Map and inputting data into grid squares
                            
                                ggplot scale_color_manual with breaks does not match expected order
                            
                                Generate all unique combinations from a vector with repeating elements
                            
                                R: What are dates in a dates vector: dates or numeric values? (difference between x[i] and i)
                            
                                Clear R environment of all objetcs & packages
                            
                                Separate a shopping list into multiple columns
                            
                                How to create a geom line plot with single geom point at the end with legend
                            
                                Combine: rowwise(), mutate(), across(), for multiple functions
                            
                                Delete duplicates between groups in R
                            
                                Iterate sequentially over two lists in R
                            
                                How can I trust a library in R?
                            
                                Shift with dynamic n (number of position lead / lag by)
                            
                                Recommended language for multithreaded data work
                            
                                Why does the ggplot legend show the "colour" parameter?
                            
                                convolution in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With