Parallelization with "future" always uses all cores

Question

I am following the documentation of the shapr package (version 1.0.5) to use parallelization, but whatever number of "workers" I specify, future always uses all available processor cores.

There seems to be something other fishy going on, because when I call plot() on the result (should invoke plot.shapr()) nothing is plotted, but calling plot.default() creates a plot. After I have run my script from the R shell with source(...) and the script returns, I can plot the result, but when I include the plot(shap, ..) anywhere in the script, it does not plot anything.

Suggestions for resolving these two problems are welcome.

Here is my code:

library(shapr)
library(future)

fit <- lm(rating ~ ., attitude)

# features without response
x <- attitude[,names(attitude)!="rating"]

local <- FALSE

if (local) {
  # not of interest here  
} else {  # (!local)

  future::plan(multisession, workers=2)  # <-- number of workers is ignored

  shap <- shapr::explain(model=fit, predict_model=predict.lm,
                         x_explain=x, x_train=x,
                         approach="gaussian", phi0=mean(fit$fitted.values),
                         verbose=NULL)

  future::plan(sequential)

  plot(1:5,sin(1:5))                # <-- plots something
  plot(shap, plot_type="beeswarm")  # <-- has no effect
}

Edit: It seems that shapr::explain automatically loads future: before the call of shapr::explain(), "future" is not listed by sessionInfo(), but afterwards it is. Presumably, the function sets the number of workers to the maximum available and resets it afterwards. This is just speculation, though, because I could not find this in explain.r of its github archive.

cdalitz · Accepted Answer

It turned out that this was neither a problem of future nor of shapr.

There are internal functions called that use OpenMP. How many processor cores these functions use depends on the environment variable OMP_NUM_THREADS. If it is unset, all available cores are allocated. Setting this environment variable within R with Sys.etenv(), however, has no effect. Instead the variable must be set before calling R in the shell ($ is the shell prompt):

$ OMP_NUM_THREADS=1 R

Interestingly, the runtime of explain() is twice as fast when using only one core compared to using all available 24 cores. Using 4 or 8 cores slows it down, too, compared to only one core.

The other problem of the not appearing plot was a ggplot2 issue, a library internally used by shapr for plotting. plot.shapr() returns a ggplot object that will only be shown if its print method is applied. This does not work, however, if a base graphics plot is still shown. The problem is thus solved as follows:

if (!is.null(dev.list())) dev.off()
print(plot(shap, plot_type="beeswarm"))

Parallelization with "future" always uses all cores

Tags:

r

shap

r-future

cdalitz

1 Answers

cdalitz

Recent Activity

Donate For Us

Parallelization with "future" always uses all cores

Tags:

r

shap

r-future

cdalitz

1 Answers

cdalitz

Related questions

Recent Activity

Donate For Us