is there a smart way to identify all functions that use .Random.seed
(the random number generator state within R) at any point in an R script?
use case: we have a dataset that changes constantly, both the records [rows] and the information [columns] - we add new records often, but we also update information in certain columns. so the dataset is constantly in flux. we fill in some missing data with an imputation, which requires random number generation with the sample()
function. so whenever we add a new row or update any information in the column, the randomly imputed numbers all change -- which is expected. we use set.seed()
at the start of each random imputation, so if a column changes but zero rows change, the other randomly-generated columns are not affected.
i'm under the impression that the only function within our entire codebase that ever touches a random seed is the sample()
function, but i would like to verify this somehow?
edit: even something that prints a function call whenever the random number state gets touched would be helpful, the same way debug()
comes to life whenever the debugged function gets triggered? for our purposes, it is pretty safe to assume that if we run our script once for dynamic evaluation and no other random functions get triggered, then we are safe.
thanks
Random Number Generators The default algorithm in R is Mersenne-Twister but a long list of methods is available.
The generator takes that seed value and then generates numbers that “look” random. The catch: if you give the random number generator the same seed value, it gives the same pseudorandom values.
To do this, use the set. seed() function. Using set. seed() will force R to produce consistent random samples at any time on any computer.
seed() Function. set. seed() function in R Language is used to create random numbers which can be reproduced. It helps in creating same random numbers each time a random function is called.
Notwithstanding my comment, here’s a brute force way of checking this:
rm(.Random.seed) # if it already exists
makeActiveBinding('.Random.seed',
function () stop('Something touched my seed', call. = FALSE),
globalenv())
This will make .Random.seed
into an active binding that throws an error when it’s touched.
This works but it’s very disruptive. Here’s a gentler variant. It has a few interesting features:
.Random.seed
With this you can write the following code, for instance:
# Ignore calls coming from sample.int
> debug_random_seed(ignore = sample.int)
> sample(5)
Getting .Random.seed
Called from sample(5)
Setting .Random.seed
Called from sample(5)
[1] 3 5 4 1 2
> sample.int(5)
[1] 5 1 2 4 3
> undebug_random_seed()
> sample(5)
[1] 2 1 5 3 4
Here is the implementation in all its glory:
debug_random_seed = local({
function (ignore) {
seed_scope = parent.env(environment())
if (is.function(ignore)) ignore = list(ignore)
if (exists('.Random.seed', globalenv())) {
if (bindingIsActive('.Random.seed', globalenv())) {
warning('.Random.seed is already being debugged')
return(invisible())
}
} else {
set.seed(NULL)
}
# Save existing seed before deleting
assign('random_seed', .Random.seed, seed_scope)
rm(.Random.seed, envir = globalenv())
debug_seed = function (new_value) {
if (sys.nframe() > 1 &&
! any(vapply(ignore, identical, logical(1), sys.function(1)))
) {
if (missing(new_value)) {
message('Getting .Random.seed')
} else {
message('Setting .Random.seed')
}
message('Called from ', deparse(sys.call(1)))
}
if (! missing(new_value)) {
assign('random_seed', new_value, seed_scope)
}
random_seed
}
makeActiveBinding('.Random.seed', debug_seed, globalenv())
}
})
undebug_random_seed = function () {
if (! (exists('.Random.seed', globalenv()) &&
bindingIsActive('.Random.seed', globalenv()))) {
warning('.Random.seed is not being debugged')
return(invisible())
}
seed = suppressMessages(.Random.seed)
rm('.Random.seed', envir = globalenv())
assign('.Random.seed', seed, globalenv())
}
Some notes about the code:
debug_random_seed
function is defined inside its own private environment. This environment is designated by seed_scope
in the code. This prevents leaking the private random_seed
variable into the global environment..Random.seed
directly on the R console, no logging occurs.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With