Below is a MWE of my problem: I have programmed a progress bar for some function using the bootstrap (via the boot function from the boot package).
This works fine as long as I don't use parallel processing (res_1core
below). If I want to use parallel processing by setting parallel = "multicore"
and ncpus = 2
, the progress bar isn't displayed properly (res_2core
below).
library(boot)
rsq <- function(formula, data, R, parallel = c("no", "multicore", "snow"), ncpus = 1) {
env <- environment()
counter <- 0
progbar <- txtProgressBar(min = 0, max = R, style = 3)
bootfun <- function(formula, data, indices) {
d <- data[indices,]
fit <- lm(formula, data = d)
curVal <- get("counter", envir = env)
assign("counter", curVal + 1, envir = env)
setTxtProgressBar(get("progbar", envir = env), curVal + 1)
return(summary(fit)$r.square)
}
res <- boot(data = data, statistic = bootfun, R = R, formula = formula, parallel = parallel, ncpus = ncpus)
return(res)
}
res_1core <- rsq(mpg ~ wt + disp, data = mtcars, R = 1000)
res_2core <- rsq(mpg ~ wt + disp, data = mtcars, R = 1000, parallel = "multicore", ncpus = 2)
I have read that this is related to the fact that the boot function calls on lapply
for single core processing and mclapply
for multicore processing. Does anyone know of an easy workaround to deal with this? I mean, I would like to display the progress taking into account all of the parallel processes.
Update
Thanks to the input of Karolis Koncevičius, I have found a workaround (just use the updated rsq
function below):
rsq <- function(formula, data, R, parallel = c("no", "multicore", "snow"), ncpus = 1) {
bootfun <- function(formula, data, indices) {
d <- data[indices,]
fit <- lm(formula, data = d)
return(summary(fit)$r.square)
}
env <- environment()
counter <- 0
progbar <- txtProgressBar(min = 0, max = R, style = 3)
flush.console()
intfun <- function(formula, data, indices) {
curVal <- get("counter", envir = env) + ncpus
assign("counter", curVal, envir = env)
setTxtProgressBar(get("progbar", envir = env), curVal)
bootfun(formula, data, indices)
}
res <- boot(data = data, statistic = intfun, R = R, formula = formula, parallel = parallel, ncpus = ncpus)
return(res)
}
Unfortunately, this only works for multicore processing when I run R from the terminal. Any ideas how to patch this so it also displays properly in R console or Rstudio?
Not exactly what you ordered, but might be helpful.
A simple statistics function to boot:
library(boot)
bootfun <- function(formula, data, indices) {
d <- data[indices,]
fit <- lm(formula, data=d)
summary(fit)$r.square
}
Higher order function to display progress:
progressReporter <- function(total, nBars=100, f, ...) {
count <- 1
step <- ceiling(total/nBars)
cat(paste(rep("|", nBars), collapse=""), "\r")
flush.console()
function(...) {
if (count %% step==0) {
cat(".")
}
count <<- count + 1
f(...)
}
}
Now this function is cheating - it displays progress every "step" of iterations. If you have 1000 iterations, use two cores and print every 10th iteration - it will do the job. The cores don't share state, but they each will run the counter up to 500, and the function will respond to both counters.
On the other hand if you do 1000 iterations, run 10 cores and report every 200 - the function will stay silent, as all the cores will count to 100 each. None will reach 200 - no progress bar. Hope you get the idea. I think it should be ok in most of the cases.
Try it out:
res_1core <- boot(formula="mpg ~ wt + disp", data=mtcars, R=1000, statistic=progressReporter(1000, nBars=100, f=bootfun))
res_2core <- boot(formula="mpg ~ wt + disp", data=mtcars, R=1000, statistic=progressReporter(1000, nBars=100, f=bootfun), parallel="multicore", ncpus=2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With