Here is an example from Hadley's advanced R book:
sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 1, 4, 1))
subset2 <- function(x, condition) {
condition_call <- substitute(condition)
r <- eval(condition_call, x, parent.frame())
x[r, ]
}
scramble <- function(x) x[sample(nrow(x)), ]
subscramble <- function(x, condition) {
scramble(subset2(x, condition))
}
subscramble(sample_df, a >= 4)
# Error in eval(expr, envir, enclos) : object 'a' not found
Hadley explains:
Can you see what the problem is? condition_call contains the expression condition. So when we evaluate condition_call it also evaluates condition, which has the value a >= 4. However, this can’t be computed because there’s no object called a in the parent environment.
I understand that there is no a
in the parent env, but, eval(condition_call, x, parent.frame())
evals conditional_call in x (a data.frame used as an environment), enclosed by parent.frame()
. As long as there is a column named a
in x, why should there be any problem?
Non-standard evaluation shows you how subset() works by combining substitute() with eval() to allow you to succinctly select rows from a data frame. Scoping issues discusses scoping issues specific to NSE, and will show you how to resolve them.
Metaprogramming. The final use of non-standard evaluation is to do metaprogramming. This is a catch-all term that encompasses any function that does computation on an unevaluated expression.
Tidy evaluation is a framework for controlling how expressions and variables in your code are evaluated by tidyverse functions. This framework, housed in the rlang package, is a powerful tool for writing more efficient and elegant code.
When subset2()
is called from within subscramble()
,
condition_call
's value is the symbol condition
(rather than the
call a >= 4
that results when it is called directly). subset()
's
call to eval()
searches for condition
first in envir=x
(the
data.frame sample_df
). Not finding it there, it next searches in enclos=parent.frame()
where it does find an object named
condition
.
That object is a promise object, whose expression slot is
a >= 4
and whose evaluation environment is .GlobalEnv
. Unless an
object named a
is found in .GlobalEnv
or further up the search
path, evaluation of the promise then fails with the observed message
that: Error in eval(expr, envir, enclos) : object 'a' not found
.
A nice way to discover what's going wrong here is to insert a
browser()
call right before the line at which subset2()
fails. That way, we can call it both directly and indirectly (from
within another function), and examine why it succeeds in the first case and
fails in the second.
subset2 <- function(x, condition) {
condition_call <- substitute(condition)
browser()
r <- eval(condition_call, x, parent.frame()) ## <- Point of failure
x[r, ]
}
When a user calls subset2()
directly,
condition_call <- substitute(condition)
assigns to condition_call
a "call" object
containing the unevaluated call a >= 4
. This call is passed in to
eval(expr, envir, enclos)
, which needs as its first argument a
symbol that evaluates to an object of class call
, name
, or
expression
. So far so good.
subset2(sample_df, a >= 4)
## Called from: subset2(sample_df, a >= 4)
Browse[1]> is(condition_call)
## [1] "call" "language"
Browse[1]> condition_call
## a >= 4
eval()
now sets to work, searching for the values of any symbols
contained in expr=condition_call
first in envir=x
and then (if
needed) in enclos=parent.frame()
and its enclosing environments. In
this case, it finds the symbol a
in envir=x
(and the symbol >=
in package:base
) and successfully completes the evaluation.
Browse[1]> ls(x)
## [1] "a" "b" "c"
Browse[1]> get("a", x)
## [1] 1 2 3 4 5
Browse[1]> eval(condition_call, x, parent.frame())
## [1] FALSE FALSE FALSE TRUE TRUE
Within the body of subscramble()
, subset2()
is called like this:
subset2(x, condition)
. Fleshed out, that call is really equivalent
to subset2(x=x, condition=condition)
. Because its supplied
argument (i.e. the value passed to the formal argument named
condition
) is the expression condition
,
condition_call <- substitute(condition)
assigns to condition_call
the symbol object condition
. (Understanding that point is pretty key to understanding exactly how the nested call fails.)
Since eval()
is happy to have a symbol (aka "name") as its first
argument, once again so far so good.
subscramble(sample_df, a >= 4)
## Called from: subset2(x, condition)
Browse[1]> is(condition_call)
## [1] "name" "language" "refObject"
Browse[1]> condition_call
## condition
Now eval()
goes to work searching for the unresolved symbol
condition
. No column in envir=x
(the data.frame sample_df
)
matches, so it moves on to enclos=parent.frame()
For fairly
complicated reasons, that environment turns out to be the evaluation
frame of the call to subscramble()
. There, it does find
an object named condition
.
Browse[1]> ls(x)
## [1] "a" "b" "c"
Browse[1]> ls(parent.frame()) ## Aha! Here's an object named "condition"
## [1] "condition" "x"
As an important aside, it turns out there are several objects named condition
on the call stack above the environment from which browser()
was called.
Browse[1]> sys.calls()
# [[1]]
# subscramble(sample_df, a >= 4)
#
# [[2]]
# scramble(subset2(x, condition))
#
# [[3]]
# subset2(x, condition)
#
Browse[1]> sys.frames()
# [[1]]
# <environment: 0x0000000007166f28> ## <- Envt in which `condition` is evaluated
#
# [[2]]
# <environment: 0x0000000007167078>
#
# [[3]]
# <environment: 0x0000000007166348> ## <- Current environment
## Orient ourselves a bit more
Browse[1]> environment()
# <environment: 0x0000000007166348>
Browse[1]> parent.frame()
# <environment: 0x0000000007166f28>
## Both environments contain objects named 'condition'
Browse[1]> ls(environment())
# [1] "condition" "condition_call" "x"
Browse[1]> ls(parent.frame())
# [1] "condition" "x"
To inspect the condition
object found by eval()
(the one in parent.frame()
, which turns out to be the evaluation frame of subscramble()
) takes some special care. I used recover()
and pryr::promise_info()
as shown below.
That inspection reveals that condition
is a promise whose expression slot is a >= 4
and whose
environment is .GlobalEnv
. Our search for a
has by this point moved well
past sample_df
(where a value of a
was to be found), so evaluation of the
expression slot fails (unless an object named a
is found in .GlobalEnv
or
somewhere else farther up the search path).
Browse[1]> library(pryr) ## For is_promise() and promise_info()
Browse[1]> recover()
#
# Enter a frame number, or 0 to exit
#
# 1: subscramble(sample_df, a >= 4)
# 2: #2: scramble(subset2(x, condition))
# 3: #1: subset2(x, condition)
#
Selection: 1
# Called from: top level
Browse[3]> is_promise(condition)
# [1] TRUE
Browse[3]> promise_info(condition)
# $code
# a >= 4
#
# $env
# <environment: R_GlobalEnv>
#
# $evaled
# [1] FALSE
#
# $value
# NULL
#
Browse[3]> get("a", .GlobalEnv)
# Error in get("a", .GlobalEnv) : object 'a' not found
For one more piece of evidence that the promise object condition
is being found
in enclos=parent.frame()
, one can point enclos
somewhere else farther up
the search path, so that parent.frame()
is skipped during condition_call
's evaluation. When one does
that, subscramble()
again fails, but this time with a message that
condition
itself was not found.
## Compare
Browse[1]> eval(condition_call, x, parent.frame())
# Error in eval(expr, envir, enclos) (from #4) : object 'a' not found
Browse[1]> eval(condition_call, x, .GlobalEnv)
# Error in eval(expr, envir, enclos) (from #4) : object 'condition' not found
This was a tricky one, so thanks for the question. The error has to do with how substitute acts when it's called on an argument. If we look at the help text from substitute():
Substitution takes place by examining each component of the parse tree as follows: If it is not a bound symbol in env, it is unchanged. If it is a promise object, i.e., a formal argument to a function or explicitly created using delayedAssign(), the expression slot of the promise replaces the symbol.
What this means is that when you evaluate condition
within the nested subset2 function, substitute
sets condition_call
to be the promise object of the unevaluated 'condition' argument. Since promise objects are pretty obscure, the definition is here: http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Promise-objects
The key points from there are:
Promise objects are part of R’s lazy evaluation mechanism. They contain three slots: a value, an expression, and an environment.
and
When the argument is accessed, the stored expression is evaluated in the stored environment, and the result is returned
Basically, within the nested function, condition_call
is set to the promise object condition
, rather than the substitution of the actual expression contained within condition
. Because promise objects 'remember' the environment they come from, it seems this overrides the behavior of eval()
- so regardless of the second argument to eval(), condition_call
is evaluated within the parent environment that the argument was passed from, in which there is no 'a'.
You can create promise objects with delayedAssign()
and observe this directly:
delayedAssign("condition", a >= 4)
substitute(condition)
eval(substitute(condition), sample_df)
You can see that substitute(condition)
does not return a >= 4
, but simply condition
, and that trying to evaluate it within the environment of sample_df
fails as it does in Hadley's example.
Hopefully this is helpful, and I'm sure someone else can clarify further.
In case anyone else stumbles upon this thread, here is the answer to task #5 below this section in Hadley's book. It also contains a possible general solution to the problem discussed above.
subset2 <- function(x, condition, env = parent.frame()) {
condition_call <- substitute(condition, env)
r <- eval(condition_call, x, env)
x[r, ]
}
scramble <- function(x) x[sample(nrow(x)), ]
subscramble <- function(x, condition) {
scramble(subset2(x, condition))
}
subscramble(sample_df, a >= 3)
The magic happens in the second line of subset2
. There, substitute
receives an explicite env
argument. From the help section for substitute
: "substitute
returns the parse tree for the (unevaluated) expression expr
, substituting any variables bound in env
." env
"Defaults to the current evaluation environment". Instead, we use the calling environment.
Check it out like this:
debugonce(subset2)
subscramble(sample_df, a >= 3)
Browse[2]> substitute(condition)
condition
Browse[2]> substitute(condition, env)
a >= 3
I am not 100% certain about the explanation here. I think it just is the way substitute
works. From the help page for substitute
:
Substitution takes place by examining each component of the parse tree as follows: (...) If it is a promise object, i.e., a formal argument to a function or explicitly created using
delayedAssign()
, the expression slot of the promise replaces the symbol. If it is an ordinary variable, its value is substituted (...).
In the current environment, condition
is a promise, so the expression slot is filled, and more importantly, condition_call receives a symbol as a value. In the calling environment, condition
is just an ordinary variable, so the value (the expression) is substituted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With