While using "multicore" parallelism using foreach and the doMC backend (I use doMC as at the time I looked into it other package did not allow logging from the I would like to get a progress bar, using the progress package, but any progress (that works on a linux terminal ie no tcltk popups) could do.
Given it uses forking I can imagine it might not be possible but I am not sure.
The intended use is to indicate progress when I load an concatenate 100's of files in parallel (usually within a #!Rscript)
I've looked at the few posts like How do you create a progress bar when using the “foreach()” function in R?. Happy to award a bounty on this.
EDIT
500 points bounty offered for someone showing me how to
Reprex
# load packages
library("futile.logger")
library("data.table")
library("foreach")
# create temp dir
tmp_dir <- tempdir()
# create names for 200 files to be created
nb_files <- 200L
file_names <- file.path(tmp_dir, sprintf("file_%s.txt", 1:nb_files))
# make it reproducible
set.seed(1L)
nb_rows <- 1000L
nb_columns <- 10L
# create those 200 files sequentially
foreach(file_i = file_names) %do%
{
DT <- as.data.table(matrix(data = runif(n = nb_rows * nb_columns), nrow = nb_rows))
fwrite(x = DT, file = file_i)
flog.info("Creating file %s", file_i)
} -> tmp
# Load back the files
foreach(file_i = file_names, .final = rbindlist) %dopar%
{
flog.info("Loading file %s", file_i)
# >>> SOME PROGRESS BAR HERE <<<
fread(file_i)
} -> final_data
# show data
final_data
Desired output
Note that the progress bar is not messed up with the print lines)
INFO [2018-07-18 19:03:48] Loading file /tmp/RtmpB13Tko/file_197.txt
INFO [2018-07-18 19:03:48] Loading file /tmp/RtmpB13Tko/file_198.txt
INFO [2018-07-18 19:03:48] Loading file /tmp/RtmpB13Tko/file_199.txt
INFO [2018-07-18 19:03:48] Loading file /tmp/RtmpB13Tko/file_200.txt
[ =======> ] 4%
EDIT 2
After the bounty ended nothing comes close to the expected result.
Logging within the progress bar messes everything. If someone gets the correct result I'll give another result-based bounty.
Here's a solution (not perfect) using custom function.
This function outputs to console (using message
) progress bar.
ii
is current iteration.N
is total number of iterations to perform.per
is step (percent) when to update the progress bar. We need this as when multiple iterations are being performed progress bar gets updated too often and output is messed up.Function:
progBar <- function(ii, N, per = 10) {
if (ii %in% seq(1, N, per)) {
x <- round(ii * 100 / N)
message("[ ",
paste(rep("=", x), collapse = ""),
paste(rep("-", 100 - x), collapse = ""),
" ] ", x, "%", "\r",
appendLF = FALSE)
if (ii == N) cat("\r")
}
}
Code to test:
library(doMC)
library(foreach)
registerDoMC(10)
nIteration <- 1e3
foreach(i = 1:nIteration, ii = icount()) %dopar% {
# For progBar ii I'm using icount(), because
# user might iterate over all kind of objects
progBar(ii, nIteration)
Sys.sleep(1 / 10)
}
PS: It's not perfect, because:
100%
(depending on the number of iterations it can stop at 99%
)print
/cat
within foreach
You can refer to this link Progress bar parallel for the few insights (May be not the exact solution) which will help in creating a progress bar parallel.
The txtProgressBar
only works when the stype is 2 or 3
library("foreach")
library("doParallel")
library("progress")
registerDoParallel(parallel::makeCluster(7, outfile = ""))
pb <- progress_bar$new(
format = " [:bar] :percent in :elapsed",
total = 30, clear = FALSE, width = 80, force = T)
a <- foreach (i = 1:30) %dopar% {
pb$tick()
Sys.sleep(0.5)
}
pb <- txtProgressBar(title = "Iterative training", min = 0, max = 30, style = 3)
foreach (i = 1:30) %dopar% {
setTxtProgressBar(pb, i)
Sys.sleep(0.5)
}
Do refer this link Monitoring the function with progress bar for the different ways a progress bar can be implemented depending on the needs.
Using Multicore: You can register a different parallel backend later, or deregister doMC by registering the sequential backend by calling the registerDoSEQ function. For example consider the following program
> x <- iris[which(iris[,5] != "setosa"), c(1,5)]
> trials <- 10000
> ptime <- system.time({
+ r <- foreach(icount(trials), .combine=cbind) %dopar% {
+ ind <- sample(100, 100, replace=TRUE)
+ result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
+ coefficients(result1)
+ }
+ })[3]
> ptime
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With