Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I parallelize in r on windows - example?

How do I get parallelizaton of code to work in r in Windows? Include a simple example. Posting this self-answered question because this was rather unpleasant to get working. You'll find package parallel does NOT work on its own, but package snow works very well.

like image 298
Carbon Avatar asked May 29 '14 05:05

Carbon


People also ask

How do I parallelize my code?

The general way to parallelize any operation is to take a particular function that should be run multiple times and make it run parallelly in different processors. To do this, you initialize a Pool with n number of processors and pass the function you want to parallelize to one of Pool s parallization methods.

Does Mclapply work on Windows?

On macOS and Unix, this is done using parallel::mclapply(); on Windows, this is done using parallel::parLapply(). For reasons discussed below, parallelization is off by default. For functions that support it, you need to set “multi. core = TRUE”; this will use all of your machine's logical cores.

How do I use multiple cores in R?

If you are on a single host, a very effective way to make use of these extra cores is to use several R instances at the same time. The operating system will indeed always assign a different core to each new R instance. In Linux, just open several the terminal windows. Then within each terminal, type R to open R.


2 Answers

Posting this because this took me bloody forever to figure out. Here's a simple example of parallelization in r that will let you test if things are working right for you and get you on the right path.

library(snow) z=vector('list',4) z=1:4 system.time(lapply(z,function(x) Sys.sleep(1))) cl<-makeCluster(###YOUR NUMBER OF CORES GOES HERE ###,type="SOCK") system.time(clusterApply(cl, z,function(x) Sys.sleep(1))) stopCluster(cl) 

You should also use library doSNOW to register foreach to the snow cluster, this will cause many packages to parallelize automatically. The command to register is registerDoSNOW(cl) (with cl being the return value from makeCluster()) , the command that undoes registration is registerDoSEQ(). Don't forget to turn off your clusters.

like image 157
Carbon Avatar answered Oct 07 '22 22:10

Carbon


This worked for me, I used package doParallel, required 3 lines of code:

# process in parallel library(doParallel)  cl <- makeCluster(detectCores(), type='PSOCK') registerDoParallel(cl)  # turn parallel processing off and run sequentially again: registerDoSEQ() 

Calculation of a random forest decreased from 180 secs to 120 secs (on a Windows computer with 4 cores).

like image 41
Sander van den Oord Avatar answered Oct 08 '22 00:10

Sander van den Oord