Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sequential evaluation of named arguments in R

Tags:

r

dplyr

nse

tibble

I am trying to understand how to succinctly implement something like the argument capture/parsing/evaluation mechanism that enables the following behavior with dplyr::tibble() (FKA dplyr::data_frame()):

# `b` finds `a` in previous arg
dplyr::tibble(a=1:5, b=a+1)
##  a  b 
##  1  2 
##  2  3 
##   ...

# `b` can't find `a` bc it doesn't exist yet
dplyr::tibble(b=a+1, a=1:5)
## Error in eval_tidy(xs[[i]], unique_output) : object 'a' not found

With base:: classes like data.frame and list, this isn't possible (maybe bc arguments aren't interpreted sequentially(?) and/or maybe bc they get evaluated in the parent environment(?)):

data.frame(a=1:5, b=a+1)
## Error in data.frame(a = 1:5, b = a + 1) : object 'a' not found

list(a=1:5, b=a+1)
## Error: object 'a' not found

So my question is: what might be a good strategy in base R to write a function list2() that is just like base::list() except that it allows tibble() behavior like list2(a=1:5, b=a+1)??

I'm aware that this is part of what "tidyeval" does, but I am interested in isolating the exact mechanism that makes this trick possible. And I'm aware that one could just say list(a <- 1:5, b <- a+1), but I am looking for a solution that does not use global assignment.

What I've been thinking so far: One inelegant and unsafe way to achieve the desired behavior would be the following -- first parse the arguments into strings, then create an environment, add each element to that environment, put them into a list, and return (suggestions for better ways to parse ... into a named list appreciated!):

list2 <- function(...){

  # (gross bc we are converting code to strings and then back again)
  argstring <- as.character(match.call(expand.dots=FALSE))[2]
  argstring <- gsub("^pairlist\\((.+)\\)$", "\\1", argstring)

  # (terrible bc commas aren't allowed except to separate args!!!)
  argstrings <- strsplit(argstring, split=", ?")[[1]]

  env <- new.env()

  # (icky bc all args must have names)
  for (arg in argstrings){
    eval(parse(text=arg), envir=env)
  }

  vars <- ls(env)
  out <- list()

  for (var in vars){
    out <- c(out, list(eval(parse(text=var), envir=env)))
  }
  return(setNames(out, vars))
}

This allows us to derive the basic behavior, but it doesn't generalize well at all (see comments in list2() definition):

list2(a=1:5, b=a+1)
## $a
## [1] 1 2 3 4 5
## 
## $b
## [1] 2 3 4 5 6

We could introduce hacks to fix little things like producing names when they aren't supplied, e.g. like this:

# (still gross but at least we don't have to supply names for everything)
list3 <- function(...){
  argstring <- as.character(match.call(expand.dots=FALSE))[2]
  argstring <- gsub("^pairlist\\((.+)\\)$", "\\1", argstring)
  argstrings <- strsplit(argstring, split=", ?")[[1]]
  env <- new.env()
  # if a name isn't supplied, create one of the form `v1`, `v2`, ...
  ctr <- 0
  for (arg in argstrings){
    ctr <- ctr+1
    if (grepl("^[a-zA-Z_] ?= ?", arg))
      eval(parse(text=arg), envir=env)
    else
      eval(parse(text=paste0("v", ctr, "=", arg)), envir=env)
  }
  vars <- ls(env)
  out <- list()
  for (var in vars){
    out <- c(out, list(eval(parse(text=var), envir=env)))
  }
  return(setNames(out, vars))
}

Then instead of this:

# evaluates `a+b-2`, but doesn't include in `env`
list2(a=1:5, b=a+1, a+b-2) 
## $a
## [1] 1 2 3 4 5
## 
## $b
## [1] 2 3 4 5 6

We get this:

list3(a=1:5, b=a+1, a+b-2)
## $a
## [1] 1 2 3 4 5
## 
## $b
## [1] 2 3 4 5 6
## 
## $v3
## [1] 1 3 5 7 9

But it feels like there will still be problematic edge cases even if we fix the issue with commas, with names, etc.

Anyone have any ideas/suggestions/insights/solutions/etc.??

Many thanks!

like image 967
lefft Avatar asked Apr 21 '18 21:04

lefft


1 Answers

The reason data.frame(a=1:5, b=a+1) doesn't work is a scoping issue, not an evaluation order issue.

Arguments to a function are normally evaluated in the calling frame. When you say a+1, you are referring to the variable a in the frame that made the call to data.frame, not the column that you are about to create.

dplyr::data_frame does very non-standard evaluation, so it can mix up frames as you saw. It appears to look first in the frame corresponding to the object that is under construction, and second in the usual place.

One way to use the dplyr semantics with a base function is to do both, e.g.

do.call(data.frame, as.list(dplyr::data_frame(a = 1:5, b = a+1)))

but this is kind of useless: you can convert a tibble to a dataframe directly, and this can't be used with other base functions, since it forces all arguments to the same length.

To write your list2 function, I'd recommend looking at the source of dplyr::data_frame, and do everything it does except the final conversion to a tibble. It's source is deceptively short:

function (...) 
{
    xs <- quos(..., .named = TRUE)
    as_tibble(lst_quos(xs, expand = TRUE))
} 

This is deceptive, because lst_quos is a private function in the tibble package, so you'll need your own copy of that, plus any private functions it calls, etc. Unless of course you don't mind using private functions, then here's your list2:

list2 <- function(...) {
     xs <- rlang::quos(..., .named = TRUE)
     tibble:::lst_quos(xs, expand = TRUE)
}

This will work until the tibble maintainer chooses to change lst_quos, which he's free to do without warning (since it's private). It wouldn't be acceptable code in a CRAN package because of this fragility.

like image 143
user2554330 Avatar answered Sep 21 '22 18:09

user2554330