I am trying to understand how to succinctly implement something like the argument capture/parsing/evaluation mechanism that enables the following behavior with dplyr::tibble()
(FKA dplyr::data_frame()
):
# `b` finds `a` in previous arg
dplyr::tibble(a=1:5, b=a+1)
## a b
## 1 2
## 2 3
## ...
# `b` can't find `a` bc it doesn't exist yet
dplyr::tibble(b=a+1, a=1:5)
## Error in eval_tidy(xs[[i]], unique_output) : object 'a' not found
With base::
classes like data.frame
and list
, this isn't possible (maybe bc arguments aren't interpreted sequentially(?) and/or maybe bc they get evaluated in the parent environment(?)):
data.frame(a=1:5, b=a+1)
## Error in data.frame(a = 1:5, b = a + 1) : object 'a' not found
list(a=1:5, b=a+1)
## Error: object 'a' not found
So my question is: what might be a good strategy in base R to write a function list2()
that is just like base::list()
except that it allows tibble()
behavior like list2(a=1:5, b=a+1)
??
I'm aware that this is part of what "tidyeval" does, but I am interested in isolating the exact mechanism that makes this trick possible. And I'm aware that one could just say list(a <- 1:5, b <- a+1)
, but I am looking for a solution that does not use global assignment.
What I've been thinking so far: One inelegant and unsafe way to achieve the desired behavior would be the following -- first parse the arguments into strings, then create an environment, add each element to that environment, put them into a list, and return (suggestions for better ways to parse ...
into a named list appreciated!):
list2 <- function(...){
# (gross bc we are converting code to strings and then back again)
argstring <- as.character(match.call(expand.dots=FALSE))[2]
argstring <- gsub("^pairlist\\((.+)\\)$", "\\1", argstring)
# (terrible bc commas aren't allowed except to separate args!!!)
argstrings <- strsplit(argstring, split=", ?")[[1]]
env <- new.env()
# (icky bc all args must have names)
for (arg in argstrings){
eval(parse(text=arg), envir=env)
}
vars <- ls(env)
out <- list()
for (var in vars){
out <- c(out, list(eval(parse(text=var), envir=env)))
}
return(setNames(out, vars))
}
This allows us to derive the basic behavior, but it doesn't generalize well at all (see comments in list2()
definition):
list2(a=1:5, b=a+1)
## $a
## [1] 1 2 3 4 5
##
## $b
## [1] 2 3 4 5 6
We could introduce hacks to fix little things like producing names when they aren't supplied, e.g. like this:
# (still gross but at least we don't have to supply names for everything)
list3 <- function(...){
argstring <- as.character(match.call(expand.dots=FALSE))[2]
argstring <- gsub("^pairlist\\((.+)\\)$", "\\1", argstring)
argstrings <- strsplit(argstring, split=", ?")[[1]]
env <- new.env()
# if a name isn't supplied, create one of the form `v1`, `v2`, ...
ctr <- 0
for (arg in argstrings){
ctr <- ctr+1
if (grepl("^[a-zA-Z_] ?= ?", arg))
eval(parse(text=arg), envir=env)
else
eval(parse(text=paste0("v", ctr, "=", arg)), envir=env)
}
vars <- ls(env)
out <- list()
for (var in vars){
out <- c(out, list(eval(parse(text=var), envir=env)))
}
return(setNames(out, vars))
}
Then instead of this:
# evaluates `a+b-2`, but doesn't include in `env`
list2(a=1:5, b=a+1, a+b-2)
## $a
## [1] 1 2 3 4 5
##
## $b
## [1] 2 3 4 5 6
We get this:
list3(a=1:5, b=a+1, a+b-2)
## $a
## [1] 1 2 3 4 5
##
## $b
## [1] 2 3 4 5 6
##
## $v3
## [1] 1 3 5 7 9
But it feels like there will still be problematic edge cases even if we fix the issue with commas, with names, etc.
Anyone have any ideas/suggestions/insights/solutions/etc.??
Many thanks!
The reason data.frame(a=1:5, b=a+1)
doesn't work is a scoping issue, not an evaluation order issue.
Arguments to a function are normally evaluated in the calling frame. When you say a+1
, you are referring to the variable a
in the frame that made the call to data.frame
, not the column that you are about to create.
dplyr::data_frame
does very non-standard evaluation, so it can mix up frames as you saw. It appears to look first in the frame corresponding to the object that is under construction, and second in the usual place.
One way to use the dplyr
semantics with a base function is to do both,
e.g.
do.call(data.frame, as.list(dplyr::data_frame(a = 1:5, b = a+1)))
but this is kind of useless: you can convert a tibble to a dataframe directly, and this can't be used with other base functions, since it forces all arguments to the same length.
To write your list2
function, I'd recommend looking at the source of dplyr::data_frame
, and do everything it does except the final conversion to a tibble. It's source is deceptively short:
function (...)
{
xs <- quos(..., .named = TRUE)
as_tibble(lst_quos(xs, expand = TRUE))
}
This is deceptive, because lst_quos
is a private function in the tibble
package, so you'll need your own copy of that, plus any private functions it calls, etc. Unless of course you don't mind using private functions, then here's your list2
:
list2 <- function(...) {
xs <- rlang::quos(..., .named = TRUE)
tibble:::lst_quos(xs, expand = TRUE)
}
This will work until the tibble
maintainer chooses to change lst_quos
, which he's free to do without warning (since it's private). It wouldn't be acceptable code in a CRAN package because of this fragility.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With