Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distinct enclosing environment, function environment, etc. in R

I have a few questions about the different environments of a function. Take the following example:

environment(sd)
# <environment: namespace:stats>

Does namespace:stats point to the enclosing environment of function sd?

pryr::where(sd) 
# <environment: package:stats>

Does package:stats point to the binding environment of function sd?

According to Advanced R by Hadley Wickham: "The enclosing environment belongs to the function, and never changes..."

But the enclosing environment of function can be changed like the below:

new.env <- new.env()
environment(f) <- new.env

A function' environment property indicates a function's executing environment, correct? An online article regarding R finding stuff through environments

To sum up my questions:

  1. Can we actually change the enclosing environment of a function or not?
  2. What are those two different environments of the stats package?
  3. What is the function's environment?

It's similar to a previous post in here.

like image 388
juanli Avatar asked Jun 08 '17 06:06

juanli


Video Answer


2 Answers

TLDR:

  1. indeed, you can change the enclosing environment. Hadley was probably talking about packaged functions.
  2. the enclosing and the binding environment. You were correct.
  3. that's the execution environment. It only exists for the time the function runs.

Function environments

You have to distinguish 4 different environments when talking about a function:

  • the binding environment is the environment where the function is found (i.e. where its name exists). This is where the actual binding of an object to its name is done. find() gives you the binding environment.
  • the enclosing environment is the environment where the function is originally created. This is not necessarily the same as the binding environment (see examples below). environment() gives you the enclosing environment.
  • the local environment is the environment within the function. You call that the execution environment.
  • the parent frame or calling environment is the environment from where the function was called.

Why does this matter

Every environment has a specific function:

  • the binding environment is the environment where you find the function.
  • the local environment is the first environment where R looks for objects.
  • the general rule is: if R doesn't find an object in the local environment, it then looks in the enclosing environment and so on. The last enclosing environment is always emptyenv().
  • the parent frame is where R looks for the value of the objects passed as arguments.

You can change the enclosing environment

Indeed, you can change the enclosing environment. It is the enclosing environment of a function from a package you cannot change. In that case you don't change the enclosing environment, you actually create a copy in the new environment:

> ls()
character(0)
> environment(sd)
<environment: namespace:stats>
> environment(sd) <- globalenv()
> environment(sd)
<environment: R_GlobalEnv>
> ls()
[1] "sd"
> find("sd")
[1] ".GlobalEnv"    "package:stats" # two functions sd now
> rm(sd)
> environment(sd)
<environment: namespace:stats>

In this case, the second sd has the global environment as the enclosing and binding environment, but the original sd is still found inside the package environment, and its enclosing environment is still the namespace of that package

The confusion might arise when you do the following:

> f <- sd
> environment(f)
<environment: namespace:stats>
> find("f")
[1] ".GlobalEnv"

What happens here? The enclosing environment is still the namespace ''stats''. That's where the function is created. However, the binding environment is now the global environment. That's where the name "f" is bound to the object.

We can change the enclosing environment to a new environment e. If you check now, the enclosing environment becomes e, but e itself is empty. f is still bound in the global environment.

> e <- new.env()
> e
<environment: 0x000000001852e0a8>
> environment(f) <- e
> find("f")
[1] ".GlobalEnv"
> environment(f)
<environment: 0x000000001852e0a8>
> ls(e)
character(0)

The enclosing environment of e is the global environment. So f still works as if its enclosure was the global environment. The environment e is enclosed in it, so if something isn't found in e, the function looks in the global environment and so on.

But because e is an environment, R calls that a parent environment.

> parent.env(e)
<environment: R_GlobalEnv>
> f(1:3)
[1] 1 

Namespaces and package environments

This principle is also the "trick" packages use:

  • the function is created in the namespace. This is an environment that is enclosed by the namespaces of other imported packages, and eventually the global environment.
  • the binding for the function is created in the package environment. This is an environment that encloses the global environment and possible other packages.

The reason for this is simple: objects can only be found inside the environment you are in, or in its enclosing environments.

  • a function must be able to find other functions(objects), so the local environment must be enclosed by possibly the namespaces of other packages it imports, the base package and lastly the global environment.
  • a function must be findable from within the global environment. Hence the binding (i.e. the name of the function) must be in an environment that is enclosed by the global environment. This is the package environment (NOT the namespace!)

An illustration:

enter image description here

Now suppose you make an environment with the empty environment as a parent. If you use this as an enclosing environment for a function, nothing works any longer. Because now you circumvent all the package environments, so you can't find a single function any more.

> orphan <- new.env(parent = emptyenv())
> environment(f) <- orphan
> f(1:3)
Error in sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),  : 
  could not find function "sqrt"

The parent frame

This is where it gets interesting. The parent frame or calling environment, is the environment where the values passed as arguments are looked up. But that parent frame can be the local environment of another function. In this case R looks first in that local environment of that other function, and then in the enclosing environment of the calling function, and so all the way up to the global environment, the environments of the attached packages until it reaches the empty environment. That's where the "object not found" bug sleeps.

like image 178
Joris Meys Avatar answered Oct 20 '22 22:10

Joris Meys


environment(function) gives the function's enclosing environment (i.e. the closure) which is assigned a pointer to the environment in which the function was defined. This convention is called lexical scoping, and is what lets you use patterns like factory functions. Here is a simple example

factory <- function(){
    # get a reference to the current environment -- i.e. the environment 
    # that was created when the function `factory` was called.
    envir = environment()
    data <- 0
    add <- function(x=1){
        # we can use the lexical scoping assignment operator to re-assign the value of data
        data <<- data + x
        # return the value of the lexically scoped variable `data`
        return(data)
    }
    return(list(envir=envir,add=add))
}

L = factory()

# check that the environment for L$add is the environment in which it was created
identical(L$envir,environment(L$add))
#> TRUE

L$add()
#> 1
L$add(3)
#> 4

note that we can re-assign the value of data in the enclosing environment using assign() like so:

assign("data",100,L$envir)
L$add()
#> 101

Also, when we call the function factory() again, another new environment is created and is assigned as the closure for the functions that get defined in that function call, which is what allows us to have to separate foo$add() funcitons wich scope to their own separate environments:

M = factory()
M$add()
#> 1
#> 2
L$add()
#> 102

The above factory function illustrates the link between the function and it's enclosing environment via continuation of the search for a variable (and use of the scoping assignment operator, whereas the following illustrates the link between a local environment and the calling frame via Promises which is how R passes variables in a function call.

Specifically, when you call a function, R creates promises for the value of variables and expressions passed. These value of the Promise is passed (copied) from the variable / expression by evaluating the Promise in the context of the calling environment when the parameter is force()'d or used -- and not sooner!

For example, This factory function takes a parameter which is stored as a promise until the returned function is called:

factory2 <- function(x){
    out <-function(){
         return(x)
    }
    return(out)
}

Now factory2 behaves intuitively in some cases:

y = 1
f = factory2(y)
f()
#> 1

but not in others:

y = 1
h = factory2(y)
y = 2
h()
#> 2

because the promise for the expression y is not evaluated until h() is called, and in the second example, the value of y is 2! Of course, now that the value has been copied from the calling environment into the local environment via Promise evaluation, changing the value of y won't affect the value returned by h():

y = 3
h()
#> 2
like image 44
Jthorpe Avatar answered Oct 20 '22 21:10

Jthorpe