Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the easiest way to parallelize a vectorized function in R?

I have a very large list X and a vectorized function f. I want to calculate f(X), but this will take a long time if I do it with a single core. I have (access to) a 48-core server. What is the easiest way to parallelize the calculation of f(X)? The following is not the right answer:

library(foreach)
library(doMC)
registerDoMC()

foreach(x=X, .combine=c) %dopar% f(x)

The above code will indeed parallelize the calculation of f(X), but it will do so by applying f separately to every element of X. This ignores the vectorized nature of f and will probably make things slower as a result, not faster. Rather than applying f elementwise to X, I want to split X into reasonably-sized chunks and apply f to those.

So, should I just manually split X into 48 equal-sized sublists and then apply f to each in parallel, then manually put together the result? Or is there a package designed for this?

In case anyone is wondering, my specific use case is here.

like image 874
Ryan C. Thompson Avatar asked Apr 06 '11 19:04

Ryan C. Thompson


People also ask

What is a Vectorised function in R?

R Programming is. The is. vector() function allows you to check if the object provided as an argument to it is a vector or not. This function takes an argument as an input and returns TRUE if the provided object is a vector. If the provided object is not a vector, this function returns FALSE.

Is vectorization the same as parallelization?

Now, Vectorization, in parallel computing, is a special case of parallelization, in which software programs that by default perform one operation at a time on a single thread are modified to perform multiple operations simultaneously.

Can R run in parallel?

Running R code in parallel can be very useful in speeding up performance. Basically, parallelization allows you to run multiple processes in your code simultaneously, rather than than iterating over a list one element at a time, or running a single process at a time.


1 Answers

Although this is an older question this might be interesting for everyone who stumbled upon this via google (like me): Have a look at the pvec function in the multicore package. I think it does exactly what you want.

like image 112
Jonas Rauch Avatar answered Oct 16 '22 15:10

Jonas Rauch