I have a computation like this (please note that this is just very simplified, cut down version, smallest reproducible example!):
computation <- function() # simplified version!
{
# a lot of big matrices here....
big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)
exp.value <- 4.5
prior <- function (x) rep(exp.value, nrow(x))
# after computation, it returns the model
list(
some_info = 5.18,
prior = prior
)
}
This function fits and returns a model, which I want to save to disk:
m <- computation()
save(m, file = "tmp.Rdata")
file.info("tmp.Rdata")$size
# [1] 30713946
Unfortunatelly, as you can see, the file is too large, since it contains the whole closure of the function prior()
, and this closure contains all the data from the computation()
function, including the big_matrix
(there are lots of them in my full code).
Now, I tried to fix it by redefining the environment (closure) of the prior function using environment(prior) <- list2env(list(exp.value = exp.value))
:
exp.value <- 4.5
environment(m$prior) <- list2env(list(exp.value = exp.value))
save(m, file = "tmp.Rdata")
file.info("tmp.Rdata")$size
# [1] 475
This works as expected! Unfortunatelly, when I put this clean up code into the computation() function (in fact, when I put this code into any function), it stops working! See:
computation <- function() # simplified version!
{
# a lot of big matrices here....
big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)
exp.value <- 4.5
prior <- function (x) rep(exp.value, nrow(x))
environment(prior) <- list2env(list(exp.value = exp.value)) # this is the update
# after computation, it returns the model
list(
some_info = 5.18,
prior = prior
)
}
m <- computation()
save(m, file = "tmp.Rdata")
file.info("tmp.Rdata")$size
# [1] 30713151
the file is huge again, the closure was not clean up correctly.
One way to fix the problem is to remove the large variable from the environment before returning.
computation <- function()
{
big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)
exp.value <- 4.5
prior <- function (x) rep(exp.value, nrow(x))
rm(big_matrix) ## remove variable
list(
some_info = 5.18,
prior = prior
)
}
The problem with your list2env
method is that by default it points to the current environment as the parent environment for the new environment so you are capturing everything inside the function anyway. You can instead specify the global environment as the base environment
computation <- function()
{
big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)
exp.value <- 4.5
prior <- function (x) rep(exp.value, nrow(x))
# explicit parent
environment(prior) <- list2env(list(exp.value = exp.value), parent=globalenv())
list(
some_info = 5.18,
prior = prior
)
}
(If you specify emptyenv()
then you won't be able to find built in functions like rep()
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With