Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programming with dplyr and lazyeval

I am having issues refactoring dplyr in a way that preserves non-standard evaluation. Lets say I want to create a function that always selects and renames.

library(lazyeval)
library(dplyr)

df <- data.frame(a = c(1,2,3), f = c(4,5,6), lm = c(7, 8 , 9))

select_happy<- function(df, col){
    col <- lazy(col)
    fo <- interp(~x, x=col)
    select_(df, happy=fo)
}

f <- function(){
    print('foo')
}

select_happy() is written according to the answer to this post Refactor R code when library functions use non-standard evaluation. select_happy() works on column names that are either undefined or defined in the global environment. However, it runs into issues when a column name is also the name of a function in another namespace.

select_happy(df, a)
#   happy
# 1     1
# 2     2
# 3     3

select_happy(df, f)
#   happy
# 1     4
# 2     5
# 3     6

select_happy(df, lm)
# Error in eval(expr, envir, enclos) (from #4) : object 'datafile' not found

environment(f)
# <environment: R_GlobalEnv>

environment(lm)
# <environment: namespace:stats>

Calling lazy() on f and lm shows a difference in the lazy object, where the function definition for lm is appearing in the lazy object, and for f it is just the name of the function.

lazy(f)
# <lazy>
#   expr: f
#   env:  <environment: R_GlobalEnv>

lazy(lm)
# <lazy>
#   expr: function (formula, data, subset, weights, na.action, method = "qr",  ...
#   env:  <environment: R_GlobalEnv>

substitute appears to work with lm.

 select_happy<- function(df, col){
     col <- substitute(col) # <- substitute() instead of lazy()
     fo <- interp(~x, x=col)
     select_(df, happy=fo)
}

select_happy(df, lm)
#   happy
# 1     7 
# 2     8
# 3     9

However, after reading the vignette on lazyeval it seems that lazy should serve as a superior substitute for substitute. Additionally, the regular select function works just fine.

select(df, happy=lm)
#   happy
# 1     7
# 2     8
# 3     9

My question is how can I write select_happy() so that it works in all the ways that select() does? I'm having a hard time wrapping my head around the scoping and non-standard evaluation. More generally, what would be a solid strategy for programming with dplyr that could avoid these and other issues?

Edit

I tested out docendo discimus's solution and it worked great, but I would like to know if there is a way to use arguments, rather than dots, for the function. I think it is also important to be able to use interp() because you might want to feed input into a more complicated formula, like in the post I linked to earlier. I think the core of the issue come down to the fact that lazy_dots() is capturing the expression differently from lazy(). I would like to understand why they are behaving differently, and how to use lazy() to get the same functionality as lazy_dots().

g <- function(...){
    lazy_dots(...)
}

h <-  function(x){
    lazy(x)
}

g(lm)[[1]]
# <lazy>
#   expr: lm
#   env:  <environment: R_GlobalEnv>
h(lm)
# <lazy>
#   expr: function (formula, data, subset, weights, na.action, method = "qr",  ...
#   env:  <environment: R_GlobalEnv> 

Even changing .follow__symbols to FALSE for lazy() so that it is the same as lazy_dots() does not work.

lazy
# function (expr, env = parent.frame(), .follow_symbols = TRUE) 
# {
#     .Call(make_lazy, quote(expr), environment(), .follow_symbols)
# }
# <environment: namespace:lazyeval>

lazy_dots
# function (..., .follow_symbols = FALSE) 
# {
#     if (nargs() == 0) 
#         return(structure(list(), class = "lazy_dots"))
#     .Call(make_lazy_dots, environment(), .follow_symbols)
# }
# <environment: namespace:lazyeval>


h2 <-  function(x){
    lazy(x, .follow_symbols=FALSE)
}

h2(lm)
# <lazy>
#  expr: x
#  env:  <environment: 0xe4a42a8>

I just feel really kind of stuck as to what to do.

like image 686
Mir Henglin Avatar asked Feb 04 '16 08:02

Mir Henglin


People also ask

Does dplyr work with data frame?

All of the dplyr functions take a data frame (or tibble) as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr. x %>% f(y) turns into f(x, y) so the result from one step is then “piped” into the next step.

What is dplyr in R used for?

dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges: mutate() adds new variables that are functions of existing variables. select() picks variables based on their names. filter() picks cases based on their values.

What is dplyr and Tidyr?

dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.

Which are 5 of the most commonly used dplyr functions?

This article will cover the five verbs of dplyr: select, filter, arrange, mutate, and summarize.


1 Answers

One option may be to make write select_happy almost the same way as the standard select function:

select_happy<- function(df, ...){
  select_(df, .dots = setNames(lazy_dots(...), "happy"))
}

f <- function(){
  print('foo')
}

> select_happy(df, a)
  happy
1     1
2     2
3     3
> 
> select_happy(df, f)
  happy
1     4
2     5
3     6
> 
> select_happy(df, lm)
  happy
1     7
2     8
3     9

Note that the function definition of the standard select function is:

> select
function (.data, ...) 
{
    select_(.data, .dots = lazyeval::lazy_dots(...))
}
<environment: namespace:dplyr>

Also note that by this definition, select_happy accepts multiple columns to be selected, but will name any additional columns "NA":

> select_happy(df, lm, a)
  happy NA
1     7  1
2     8  2
3     9  3

Of course you could make some modifications for such cases, for example:

select_happy<- function(df, ...){
  dots <- lazy_dots(...)
  n <- length(dots)
  if(n == 1) newnames <- "happy" else newnames <- paste0("happy", seq_len(n))
  select_(df, .dots = setNames(dots, newnames))
}

> select_happy(df, f)
  happy
1     4
2     5
3     6

> select_happy(df, lm, a)
  happy1 happy2
1      7      1
2      8      2
3      9      3
like image 100
talat Avatar answered Sep 29 '22 18:09

talat