comprehensive way to check for functions that use the random number generator in an R script?

is there a smart way to identify all functions that use .Random.seed (the random number generator state within R) at any point in an R script?

use case: we have a dataset that changes constantly, both the records [rows] and the information [columns] - we add new records often, but we also update information in certain columns. so the dataset is constantly in flux. we fill in some missing data with an imputation, which requires random number generation with the sample() function. so whenever we add a new row or update any information in the column, the randomly imputed numbers all change -- which is expected. we use set.seed() at the start of each random imputation, so if a column changes but zero rows change, the other randomly-generated columns are not affected.

i'm under the impression that the only function within our entire codebase that ever touches a random seed is the sample() function, but i would like to verify this somehow?

edit: even something that prints a function call whenever the random number state gets touched would be helpful, the same way debug() comes to life whenever the debugged function gets triggered? for our purposes, it is pretty safe to assume that if we run our script once for dynamic evaluation and no other random functions get triggered, then we are safe.

thanks

Which random number generator does r use?

Random Number Generators The default algorithm in R is Mersenne-Twister but a long list of methods is available.

How does random number generator work in R?

The generator takes that seed value and then generates numbers that “look” random. The catch: if you give the random number generator the same seed value, it gives the same pseudorandom values.

How do I generate random data in R?

To do this, use the set. seed() function. Using set. seed() will force R to produce consistent random samples at any time on any computer.

How do I generate the same random number in R?

seed() Function. set. seed() function in R Language is used to create random numbers which can be reproduced. It helps in creating same random numbers each time a random function is called.

Notwithstanding my comment, here’s a brute force way of checking this:

rm(.Random.seed) # if it already exists
makeActiveBinding('.Random.seed',
                  function () stop('Something touched my seed', call. = FALSE),
                  globalenv())

This will make .Random.seed into an active binding that throws an error when it’s touched.

This works but it’s very disruptive. Here’s a gentler variant. It has a few interesting features:

It allows enabling and disabling debugging of .Random.seed
It supports getting and setting the seed
It logs the call but doesn’t stop execution
It maintains a “whitelist” of top-level calls that shouldn’t be logged

With this you can write the following code, for instance:

# Ignore calls coming from sample.int
> debug_random_seed(ignore = sample.int)

> sample(5)
Getting .Random.seed
Called from sample(5)
Setting .Random.seed
Called from sample(5)
[1] 3 5 4 1 2

> sample.int(5)
[1] 5 1 2 4 3

> undebug_random_seed()

> sample(5)
[1] 2 1 5 3 4

Here is the implementation in all its glory:

debug_random_seed = local({
    function (ignore) {
        seed_scope = parent.env(environment())

        if (is.function(ignore)) ignore = list(ignore)

        if (exists('.Random.seed', globalenv())) {
            if (bindingIsActive('.Random.seed', globalenv())) {
                warning('.Random.seed is already being debugged')
                return(invisible())
            }
        } else {
            set.seed(NULL)
        }

        # Save existing seed before deleting
        assign('random_seed', .Random.seed, seed_scope)
        rm(.Random.seed, envir = globalenv())

        debug_seed = function (new_value) {
            if (sys.nframe() > 1 &&
                ! any(vapply(ignore, identical, logical(1), sys.function(1)))
            ) {
                if (missing(new_value)) {
                    message('Getting .Random.seed')
                } else {
                    message('Setting .Random.seed')
                }
                message('Called from ', deparse(sys.call(1)))
            }

            if (! missing(new_value)) {
                assign('random_seed', new_value, seed_scope)
            }

            random_seed
        }

        makeActiveBinding('.Random.seed', debug_seed, globalenv())
    }
})

undebug_random_seed = function () {
    if (! (exists('.Random.seed', globalenv()) &&
           bindingIsActive('.Random.seed', globalenv()))) {
        warning('.Random.seed is not being debugged')
        return(invisible())
    }

    seed = suppressMessages(.Random.seed)
    rm('.Random.seed', envir = globalenv())
    assign('.Random.seed', seed, globalenv())
}

Some notes about the code:

The debug_random_seed function is defined inside its own private environment. This environment is designated by seed_scope in the code. This prevents leaking the private random_seed variable into the global environment.
The function defensively checks whether debugging is already enabled. Overkill maybe.
Debug information is only printed when the seed is accessed within a function call. If the user inspects .Random.seed directly on the R console, no logging occurs.

comprehensive way to check for functions that use the random number generator in an R script?

Tags:

random

r

random-seed

Anthony Damico

People also ask

1 Answers

Konrad Rudolph

Recent Activity

Donate For Us

comprehensive way to check for functions that use the random number generator in an R script?

Tags:

random

r

random-seed

Anthony Damico

People also ask

1 Answers

Konrad Rudolph

Related questions

Recent Activity

Donate For Us