Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing variables in closure in R

Tags:

closures

r

In the following example, why do f$i and f$get_i() return different results?

factory <- function() {

  my_list <- list()
  my_list$i <- 1

  my_list$increment <- function() {
    my_list$i <<- my_list$i + 1
  }

  my_list$get_i <- function() {
    my_list$i
  }

  my_list
}

f <- factory()

f$increment()
f$get_i() # returns 2
f$i # returns 1
like image 833
RobinL Avatar asked Jun 29 '17 10:06

RobinL


People also ask

What is a closure variable in R?

A closure in R is an object that contains functions bound to the environment the closure was created in. These functions maintain access to the scope in which they were defined, allowing for powerful design patterns that are difficult with the standard S3/S4 approach to objects in R.

What is a function closure in R?

Closures are used for creating functions within a function in the R environment. These are useful when the functions are changing/ repeating in the code with the same dataset.

Where are closure variables stored?

So. With this in mind, the answer is that variables in a closure are stored in the stack and heap.

What are closure variables?

In JavaScript, a closure is a function that references variables in the outer scope from its inner scope. The closure preserves the outer scope inside its inner scope. To understand the closures, you need to know how the lexical scoping works first.


Video Answer


3 Answers

The way you code is very similar to the functional paradigm. R is more often used as a script language. So unless you exactly know what you are doing, it is bad practice to use <<- or to include functions in a functions.

You can find the explanation here at the function environment chapter.

Environment is a space/frame where your code is executed. Environment can be nested, in the same way functions are.

When creating a function, you have an enclosure environment attached which can be called by environment. This is the enclosing environment.

The function is executed in another environment, the execution environment with the fresh start principle. The execution environment is a children environment of the enclosing environment.

For exemple, on my laptop:

> environment()
<environment: R_GlobalEnv>
> environment(f$increment)
<environment: 0x0000000022365d58>
> environment(f$get_i)
<environment: 0x0000000022365d58>

f is an object located in the global environment.

The function increment has the enclosing environment 0x0000000022365d58 attached, the execution environment of the function factory.

I quote from Hadley:

When you create a function inside another function, the enclosing environment of the child function is the execution environment of the parent, and the execution environment is no longer ephemeral.

When the function f is executed, the enclosing environments are created with the my_list object in it.

That can be assessed with the ls command:

> ls(envir = environment(f$increment))
[1] "my_list"
> ls(envir = environment(f$get_i))
[1] "my_list"

The <<- operator is searching in the parents environments for the variables used. In that case, the my_list object found is the one in the immediate upper environment which is the enclosing environment of the function.

So when an increment is made, it is made only in that environment and not in the global.

You can see it by replacing the increment function by that:

  my_list$increment <- function() {
    print("environment")
    print(environment())
    print("Parent environment")
    print(parent.env(environment()))
    my_list$i <<- my_list$i + 1
  }

It give me:

> f$increment()
[1] "environment"
<environment: 0x0000000013c18538>
[1] "Parent environment"
<environment: 0x0000000022365d58>

You can use get to access to your result once you have stored the environment name:

> my_main_env <- environment(f$increment)
> get("my_list", env = my_main_env)
$i
[1] 2

$increment
function () 
{
    print("environment")
    print(environment())
    print("Parent environment")
    print(parent.env(environment()))
    my_list$i <<- my_list$i + 1
}
<environment: 0x0000000022365d58>

$get_i
function () 
{
    print("environment")
    print(environment())
    print("Parent environment")
    print(parent.env(environment()))
    my_list$i
}
<environment: 0x0000000022365d58>
like image 148
YCR Avatar answered Oct 17 '22 14:10

YCR


f <- factory()

creates my_list object with my_list$i = 1 and assigns it to f. So now f$i = 1.

f$increment() 

increments my_list$i only. It does not affect f.

Now

f$get_i() 

returns (previously incremented) my_list$i while

f$i 

returns unaffected f$i

It' because you used <<- operator that operates on global objects. If you change your code to

my_list$increment <- function(inverse) {
    my_list$i <- my_list$i + 1
}

my_list will be incremented only inside increment function. So now you get

> f$get_i() 
[1] 1
> f$i 
[1] 1

Let me add a one more line to your code, so we could see increment's intestines:

 my_list$increment <- function(inverse) {
    my_list$i <- my_list$i + 1
    return(my_list$i)
  }

Now, you can see that <- operates only inside increment while <<- operated outside of it.

> f <- factory()
> f$increment()
[1] 2
> f$get_i() 
[1] 1
> f$i
[1] 1
like image 23
Łukasz Deryło Avatar answered Oct 17 '22 14:10

Łukasz Deryło


Based on comments from @Cath on "value by reference", I was inspired to come up with this.

library(data.table)
factory <- function() {
   my_list <- list()
   my_list$i <- data.table(1)

   my_list$increment <- function(inverse) {
     my_list$i[ j = V1:=V1+1]
  }

   my_list$get_i <- function() {
     my_list$i
   }
   my_list
 }
f <- factory()
f$increment()
f$get_i() # returns 2
   V1
1:  2
f$i # returns 1
   V1
1:  2
f$increment()
f$get_i() # returns 2
   V1
1:  3
f$i # returns 1
   V1
1:  3
like image 4
Roman Luštrik Avatar answered Oct 17 '22 15:10

Roman Luštrik