I am following the documentation of the shapr package (version 1.0.5) to use parallelization, but whatever number of "workers" I specify, future always uses all available processor cores.
There seems to be something other fishy going on, because when I call plot() on the result (should invoke plot.shapr()) nothing is plotted, but calling plot.default() creates a plot. After I have run my script from the R shell with source(...) and the script returns, I can plot the result, but when I include the plot(shap, ..) anywhere in the script, it does not plot anything.
Suggestions for resolving these two problems are welcome.
Here is my code:
library(shapr)
library(future)
fit <- lm(rating ~ ., attitude)
# features without response
x <- attitude[,names(attitude)!="rating"]
local <- FALSE
if (local) {
# not of interest here
} else { # (!local)
future::plan(multisession, workers=2) # <-- number of workers is ignored
shap <- shapr::explain(model=fit, predict_model=predict.lm,
x_explain=x, x_train=x,
approach="gaussian", phi0=mean(fit$fitted.values),
verbose=NULL)
future::plan(sequential)
plot(1:5,sin(1:5)) # <-- plots something
plot(shap, plot_type="beeswarm") # <-- has no effect
}
Edit: It seems that shapr::explain automatically loads future: before the call of shapr::explain(), "future" is not listed by sessionInfo(), but afterwards it is. Presumably, the function sets the number of workers to the maximum available and resets it afterwards. This is just speculation, though, because I could not find this in explain.r of its github archive.
It turned out that this was neither a problem of future nor of shapr.
There are internal functions called that use OpenMP. How many processor cores these functions use depends on the environment variable OMP_NUM_THREADS. If it is unset, all available cores are allocated. Setting this environment variable within R with Sys.etenv(), however, has no effect. Instead the variable must be set before calling R in the shell ($ is the shell prompt):
$ OMP_NUM_THREADS=1 R
Interestingly, the runtime of explain() is twice as fast when using only one core compared to using all available 24 cores. Using 4 or 8 cores slows it down, too, compared to only one core.
The other problem of the not appearing plot was a ggplot2 issue, a library internally used by shapr for plotting. plot.shapr() returns a ggplot object that will only be shown if its print method is applied. This does not work, however, if a base graphics plot is still shown. The problem is thus solved as follows:
if (!is.null(dev.list())) dev.off()
print(plot(shap, plot_type="beeswarm"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With