Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to get a progress bar with foreach and a "multicore-kind" of backend

While using "multicore" parallelism using foreach and the doMC backend (I use doMC as at the time I looked into it other package did not allow logging from the I would like to get a progress bar, using the progress package, but any progress (that works on a linux terminal ie no tcltk popups) could do.

Given it uses forking I can imagine it might not be possible but I am not sure.

The intended use is to indicate progress when I load an concatenate 100's of files in parallel (usually within a #!Rscript)

I've looked at the few posts like How do you create a progress bar when using the “foreach()” function in R?. Happy to award a bounty on this.

EDIT

500 points bounty offered for someone showing me how to

  1. using foreach and a multicore (forking) type of parallelism
  2. get a progress bar
  3. get logging using futile.logger

Reprex

# load packages                                                                                                        
library("futile.logger")                                                                                               
library("data.table")                                                                                                  
library("foreach")                                                                                                     
# create temp dir                                                                                                      
tmp_dir <- tempdir()                                                                                                   
# create names for 200 files to be created                                                                             
nb_files <- 200L                                                                                                       
file_names <- file.path(tmp_dir, sprintf("file_%s.txt", 1:nb_files))                                                   
# make it reproducible                                                                                                 
set.seed(1L)                                                                                                           
nb_rows <- 1000L                                                                                                       
nb_columns <- 10L                                                                                                      
# create those 200 files sequentially                                                                                  
foreach(file_i = file_names) %do%                                                                                      
{                                                                                                                      
    DT <- as.data.table(matrix(data = runif(n = nb_rows * nb_columns), nrow = nb_rows))                                
    fwrite(x = DT, file = file_i)                                                                                      
    flog.info("Creating file %s", file_i)                                                                              
} -> tmp                                                                                                               

# Load back the files                                                                                                  
foreach(file_i = file_names, .final = rbindlist) %dopar%                                                               
{                                                                                                                      
    flog.info("Loading file %s", file_i)                                                                               
    # >>> SOME PROGRESS BAR HERE <<<                                                                                   
    fread(file_i)                                                                                                      
} -> final_data                                                                                                        
# show data                                                                                                            
final_data                                                                                                             

Desired output

Note that the progress bar is not messed up with the print lines)

INFO [2018-07-18 19:03:48] Loading file /tmp/RtmpB13Tko/file_197.txt
INFO [2018-07-18 19:03:48] Loading file /tmp/RtmpB13Tko/file_198.txt
INFO [2018-07-18 19:03:48] Loading file /tmp/RtmpB13Tko/file_199.txt
INFO [2018-07-18 19:03:48] Loading file /tmp/RtmpB13Tko/file_200.txt
[ =======>                          ] 4% 

EDIT 2

After the bounty ended nothing comes close to the expected result.

Logging within the progress bar messes everything. If someone gets the correct result I'll give another result-based bounty.

like image 601
statquant Avatar asked Jul 06 '18 15:07

statquant


2 Answers

Here's a solution (not perfect) using custom function.

This function outputs to console (using message) progress bar.

  • ii is current iteration.
  • N is total number of iterations to perform.
  • per is step (percent) when to update the progress bar. We need this as when multiple iterations are being performed progress bar gets updated too often and output is messed up.

Function:

progBar <- function(ii, N, per = 10) {
    if (ii %in% seq(1, N, per)) {
        x <- round(ii * 100 / N)
        message("[ ", 
                paste(rep("=", x), collapse = ""),
                paste(rep("-", 100 - x), collapse = ""), 
                " ] ", x, "%", "\r",
                appendLF = FALSE)
        if (ii == N) cat("\r")
    }
}

Code to test:

library(doMC)
library(foreach)
registerDoMC(10)

nIteration <- 1e3
foreach(i = 1:nIteration, ii = icount()) %dopar% {
    # For progBar ii I'm using icount(), because
    # user might iterate over all kind of objects
    progBar(ii, nIteration)
    Sys.sleep(1 / 10)
}

enter image description here

PS: It's not perfect, because:

  • Bar not always runs to 100% (depending on the number of iterations it can stop at 99%)
  • Sometimes output messes up (depends on number of iterations and how often they switch) - still debugging this part
  • Console is not flushed if you use print/cat within foreach
like image 71
pogibas Avatar answered Oct 21 '22 17:10

pogibas


You can refer to this link Progress bar parallel for the few insights (May be not the exact solution) which will help in creating a progress bar parallel.

The txtProgressBar only works when the stype is 2 or 3

library("foreach")
library("doParallel")
library("progress")

registerDoParallel(parallel::makeCluster(7, outfile = ""))

pb <- progress_bar$new(
            format = " [:bar] :percent in :elapsed",
            total = 30, clear = FALSE, width = 80, force = T)
a <- foreach (i  = 1:30) %dopar% {
    pb$tick()
    Sys.sleep(0.5)
}


pb <- txtProgressBar(title = "Iterative training", min = 0, max = 30, style = 3)

foreach (i  = 1:30) %dopar% {
    setTxtProgressBar(pb, i)
    Sys.sleep(0.5)
}

Do refer this link Monitoring the function with progress bar for the different ways a progress bar can be implemented depending on the needs.

Using Multicore: You can register a different parallel backend later, or deregister doMC by registering the sequential backend by calling the registerDoSEQ function. For example consider the following program

> x <- iris[which(iris[,5] != "setosa"), c(1,5)]
> trials <- 10000
> ptime <- system.time({
+ r <- foreach(icount(trials), .combine=cbind) %dopar% {
+ ind <- sample(100, 100, replace=TRUE)
+ result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
+ coefficients(result1)
+ }
+ })[3]
> ptime
like image 1
Subash J Avatar answered Oct 21 '22 17:10

Subash J