Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to show the progress of code in parallel computation in R?

I am now dealing with a large dataset and some functions may take hours to process. I wonder how I can show the progress of the code through a progress bar or number(1,2,3,...,100). And I want to store the result as a data frame with two columns. Here is an example. Thanks.

require(foreach)
require(doParallel)
require(Kendall)

cores=detectCores()
cl <- makeCluster(cores-1)
registerDoParallel(cl)

mydata=matrix(rnorm(8000*500),ncol = 500)
result=as.data.frame(matrix(nrow = 8000,ncol = 2))
pb <- txtProgressBar(min = 1, max = 8000, style = 3)

foreach(i=1:8000,.packages = "Kendall",.combine = rbind) %dopar%         
{
  abc=MannKendall(mydata[i,])
  result[i,1]=abc$tau
  result[i,2]=abc$sl
  setTxtProgressBar(pb, i)
}
close(pb)
stopCluster(cl)

However, when I run the code, I did not see any progress bar showing up and the result is not right. Is there any suggestion? Thanks.

like image 458
Yang Yang Avatar asked Nov 18 '16 19:11

Yang Yang


People also ask

How do I check the progress of an R code?

A progress indicator for the R console is what you need. The txtProgressBar() command can help you here. The command allows you to set-up a progress indicator that displays in the R console and shows your progress towards the “end point”.

How do I enable parallel processing in R?

The most straightforward way to enable parallel processing is by switching from using lapply to mclapply .

How do I add a progress bar in R?

The txtProgressBar function The most common functions used to add a progress bar in R are the txtProgressBar and setTxtProgressBar functions from R base. In the following block of code we show you how to set a progress bar inside a for loop, briefly describing the different arguments that you can customize.

Can R do parallel computing?

Many computations in R can be made faster by the use of parallel computation. Generally, parallel computation is the simultaneous execution of different pieces of a larger computation across multiple computing processors or cores.


2 Answers

The doSNOW package has support for progress bars, while doParallel does not. Here's a way to put a progress bar in your example:

require(doSNOW)
require(Kendall)
cores <- parallel::detectCores()
cl <- makeSOCKcluster(cores)
registerDoSNOW(cl)
mydata <- matrix(rnorm(8000*500), ncol=500)
pb <- txtProgressBar(min=1, max=8000, style=3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress=progress)
result <- 
  foreach(i=1:8000, .packages="Kendall", .options.snow=opts,
          .combine='rbind') %dopar% {
    abc <- MannKendall(mydata[i,])
    data.frame(tau=abc$tau, sl=abc$sl)
  }
close(pb)
stopCluster(cl)
like image 182
Steve Weston Avatar answered Oct 23 '22 23:10

Steve Weston


I think the pbapply package also does the job.

require(parallel)
require(pbapply)

mydata=matrix(rnorm(8000*500),ncol = 500)

cores=detectCores()
cl <- makeCluster(cores-1)
parallel::clusterExport(cl= cl,varlist = c("mydata"))
parallel::clusterEvalQ(cl= cl,library(Kendall))

result = pblapply(cl = cl,
         X = 1:8000,
         FUN = function(i){
  abc=MannKendall(mydata[i,])
  result = as.data.frame(matrix(nrow = 1,ncol = 2))
  result[1,1]=abc$tau
  result[1,2]=abc$sl
  return(result)
})

result = dplyr::bind_rows(result)
stopCluster(cl)

From the documentation, if a socket cluster is provided via cl then it calls parLapply()

Parallel processing can be enabled through the cl argument. parLapply is called when cl is a ’cluster’ object, mclapply is called when cl is an integer.

like image 38
David Mas Avatar answered Oct 23 '22 23:10

David Mas