Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to clean up the function closure (environment) when returning and saving it?

Tags:

closures

r

I have a computation like this (please note that this is just very simplified, cut down version, smallest reproducible example!):

computation <- function() # simplified version!
{
    # a lot of big matrices here....
    big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)

    exp.value <- 4.5
    prior <- function (x) rep(exp.value, nrow(x))

    # after computation, it returns the model
    list(
        some_info = 5.18,
        prior = prior
    )
}

This function fits and returns a model, which I want to save to disk:

m <- computation()
save(m, file = "tmp.Rdata")
file.info("tmp.Rdata")$size
# [1] 30713946

Unfortunatelly, as you can see, the file is too large, since it contains the whole closure of the function prior(), and this closure contains all the data from the computation() function, including the big_matrix (there are lots of them in my full code).

Now, I tried to fix it by redefining the environment (closure) of the prior function using environment(prior) <- list2env(list(exp.value = exp.value)):

exp.value <- 4.5
environment(m$prior) <- list2env(list(exp.value = exp.value))
save(m, file = "tmp.Rdata")
file.info("tmp.Rdata")$size
# [1] 475

This works as expected! Unfortunatelly, when I put this clean up code into the computation() function (in fact, when I put this code into any function), it stops working! See:

computation <- function() # simplified version!
{
    # a lot of big matrices here....
    big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)

    exp.value <- 4.5
    prior <- function (x) rep(exp.value, nrow(x))
    environment(prior) <- list2env(list(exp.value = exp.value)) # this is the update

    # after computation, it returns the model
    list(
        some_info = 5.18,
        prior = prior
    )
}
m <- computation()
save(m, file = "tmp.Rdata")
file.info("tmp.Rdata")$size
# [1] 30713151

the file is huge again, the closure was not clean up correctly.

  1. I don't understandWhat is going on here? Why is the clean-up code working when run outside of any function and stops working when in function?
  2. How to make it work inside a function?
like image 481
Tomas Avatar asked Nov 14 '19 21:11

Tomas


1 Answers

One way to fix the problem is to remove the large variable from the environment before returning.

computation <- function() 
{
    big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)

    exp.value <- 4.5
    prior <- function (x) rep(exp.value, nrow(x))

    rm(big_matrix) ## remove variable

    list(
        some_info = 5.18,
        prior = prior
    )
}

The problem with your list2env method is that by default it points to the current environment as the parent environment for the new environment so you are capturing everything inside the function anyway. You can instead specify the global environment as the base environment

computation <- function() 
{
  big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)

  exp.value <- 4.5
  prior <- function (x) rep(exp.value, nrow(x))
                                                              # explicit parent
  environment(prior) <- list2env(list(exp.value = exp.value), parent=globalenv()) 

  list(
    some_info = 5.18,
    prior = prior
  )
}

(If you specify emptyenv() then you won't be able to find built in functions like rep())

like image 122
MrFlick Avatar answered Oct 03 '22 00:10

MrFlick