Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why would an R package load random numbers?

Recently, I was reading the documentation for the caret package when I noticed this:

Also, please note that some packages load random numbers when loaded (directly or via namespace) and this may effect [sic] reproducibility.

What are possible use cases for packages loading random numbers? This seems to be counter to the idea of reproducible research and might interfere with my own attempts to set.seed. (I've started setting seeds closer to code that requires random number generation precisely because I'm worried about the side effects of loading packages.)

like image 364
Sean Raleigh Avatar asked Apr 04 '18 00:04

Sean Raleigh


1 Answers

One example of a package that does this is ggplot2, as mentioned by Hadley Wickham in a response to a GitHub issue related to tidyverse.

When the package is attached, a tip is randomly selected to be displayed for the user (and with some probability, no tip is displayed). If we examine its .onAttach() function as it existed before January 2018, we see it calls both runif() and sample(), changing the seed:

.onAttach <- function(...) {
  if (!interactive() || stats::runif(1) > 0.1) return()

  tips <- c(
    "Need help? Try the ggplot2 mailing list: http://groups.google.com/group/ggplot2.",
    "Find out what's changed in ggplot2 at http://github.com/tidyverse/ggplot2/releases.",
    "Use suppressPackageStartupMessages() to eliminate package startup messages.",
    "Stackoverflow is a great place to get help: http://stackoverflow.com/tags/ggplot2.",
    "Need help getting started? Try the cookbook for R: http://www.cookbook-r.com/Graphs/",
    "Want to understand how all the pieces fit together? Buy the ggplot2 book: http://ggplot2.org/book/"
  )

  tip <- sample(tips, 1)
  packageStartupMessage(paste(strwrap(tip), collapse = "\n"))
}

release_questions <- function() {
  c(
    "Have you built the book?"
  )
}

However, this has since been fixed with a commit authored by Jim Hester so that the seed is reset after ggplot2 is attached:

.onAttach <- function(...) {
  withr::with_preserve_seed({
    if (!interactive() || stats::runif(1) > 0.1) return()

    tips <- c(
      "Need help? Try the ggplot2 mailing list: http://groups.google.com/group/ggplot2.",
      "Find out what's changed in ggplot2 at http://github.com/tidyverse/ggplot2/releases.",
      "Use suppressPackageStartupMessages() to eliminate package startup messages.",
      "Stackoverflow is a great place to get help: http://stackoverflow.com/tags/ggplot2.",
      "Need help getting started? Try the cookbook for R: http://www.cookbook-r.com/Graphs/",
      "Want to understand how all the pieces fit together? Buy the ggplot2 book: http://ggplot2.org/book/"
      )

    tip <- sample(tips, 1)
    packageStartupMessage(paste(strwrap(tip), collapse = "\n"))
  })
}

So, there could be various reasons why a package does this, though there are ways that package authors can prevent this from giving unexpected consequences to the user.

like image 193
duckmayr Avatar answered Oct 27 '22 04:10

duckmayr