I am having issues refactoring dplyr in a way that preserves non-standard evaluation. Lets say I want to create a function that always selects and renames. <pre class="prettyprint"><code>library(lazyeval) library(dplyr) df <- data.frame(a = c(1,2,3), f = c(4,5,6), lm = c(7, 8 , 9)) select_happy<- function(df, col){ col <- lazy(col) fo <- interp(~x, x=col) select_(df, happy=fo) } f <- function(){ print('foo') } </code></pre> <code>select_happy()</code> is written according to the answer to this post Refactor R code when library functions use non-standard evaluation. <code>select_happy()</code> works on column names that are either undefined or defined in the global environment. However, it runs into issues when a column name is also the name of a function in another namespace. <pre class="prettyprint"><code>select_happy(df, a) # happy # 1 1 # 2 2 # 3 3 select_happy(df, f) # happy # 1 4 # 2 5 # 3 6 select_happy(df, lm) # Error in eval(expr, envir, enclos) (from #4) : object 'datafile' not found environment(f) # <environment: R_GlobalEnv> environment(lm) # <environment: namespace:stats> </code></pre> Calling <code>lazy()</code> on f and lm shows a difference in the lazy object, where the function definition for lm is appearing in the lazy object, and for f it is just the name of the function. <pre class="prettyprint"><code>lazy(f) # <lazy> # expr: f # env: <environment: R_GlobalEnv> lazy(lm) # <lazy> # expr: function (formula, data, subset, weights, na.action, method = "qr", ... # env: <environment: R_GlobalEnv> </code></pre> <code>substitute</code> appears to work with lm. <pre class="prettyprint"><code> select_happy<- function(df, col){ col <- substitute(col) # <- substitute() instead of lazy() fo <- interp(~x, x=col) select_(df, happy=fo) } select_happy(df, lm) # happy # 1 7 # 2 8 # 3 9 </code></pre> However, after reading the vignette on <code>lazyeval</code> it seems that <code>lazy</code> should serve as a superior substitute for <code>substitute</code>. Additionally, the regular <code>select</code> function works just fine. <pre class="prettyprint"><code>select(df, happy=lm) # happy # 1 7 # 2 8 # 3 9 </code></pre> My question is how can I write <code>select_happy()</code> so that it works in all the ways that <code>select()</code> does? I'm having a hard time wrapping my head around the scoping and non-standard evaluation. More generally, what would be a solid strategy for programming with dplyr that could avoid these and other issues? Edit I tested out docendo discimus's solution and it worked great, but I would like to know if there is a way to use arguments, rather than dots, for the function. I think it is also important to be able to use <code>interp()</code> because you might want to feed input into a more complicated formula, like in the post I linked to earlier. I think the core of the issue come down to the fact that <code>lazy_dots()</code> is capturing the expression differently from <code>lazy()</code>. I would like to understand why they are behaving differently, and how to use <code>lazy()</code> to get the same functionality as <code>lazy_dots()</code>. <pre class="prettyprint"><code>g <- function(...){ lazy_dots(...) } h <- function(x){ lazy(x) } g(lm)[[1]] # <lazy> # expr: lm # env: <environment: R_GlobalEnv> h(lm) # <lazy> # expr: function (formula, data, subset, weights, na.action, method = "qr", ... # env: <environment: R_GlobalEnv> </code></pre> Even changing <code>.follow__symbols</code> to <code>FALSE</code> for <code>lazy()</code> so that it is the same as <code>lazy_dots()</code> does not work. <pre class="prettyprint"><code>lazy # function (expr, env = parent.frame(), .follow_symbols = TRUE) # { # .Call(make_lazy, quote(expr), environment(), .follow_symbols) # } # <environment: namespace:lazyeval> lazy_dots # function (..., .follow_symbols = FALSE) # { # if (nargs() == 0) # return(structure(list(), class = "lazy_dots")) # .Call(make_lazy_dots, environment(), .follow_symbols) # } # <environment: namespace:lazyeval> h2 <- function(x){ lazy(x, .follow_symbols=FALSE) } h2(lm) # <lazy> # expr: x # env: <environment: 0xe4a42a8> </code></pre> I just feel really kind of stuck as to what to do.

One option may be to make write <code>select_happy</code> almost the same way as the standard <code>select</code> function: <pre class="prettyprint"><code>select_happy<- function(df, ...){ select_(df, .dots = setNames(lazy_dots(...), "happy")) } f <- function(){ print('foo') } > select_happy(df, a) happy 1 1 2 2 3 3 > > select_happy(df, f) happy 1 4 2 5 3 6 > > select_happy(df, lm) happy 1 7 2 8 3 9 </code></pre> Note that the function definition of the standard <code>select</code> function is: <pre class="prettyprint"><code>> select function (.data, ...) { select_(.data, .dots = lazyeval::lazy_dots(...)) } <environment: namespace:dplyr> </code></pre> Also note that by this definition, <code>select_happy</code> accepts multiple columns to be selected, but will name any additional columns "NA": <pre class="prettyprint"><code>> select_happy(df, lm, a) happy NA 1 7 1 2 8 2 3 9 3 </code></pre> Of course you could make some modifications for such cases, for example: <pre class="prettyprint"><code>select_happy<- function(df, ...){ dots <- lazy_dots(...) n <- length(dots) if(n == 1) newnames <- "happy" else newnames <- paste0("happy", seq_len(n)) select_(df, .dots = setNames(dots, newnames)) } > select_happy(df, f) happy 1 4 2 5 3 6 > select_happy(df, lm, a) happy1 happy2 1 7 1 2 8 2 3 9 3 </code></pre>

Programming with dplyr and lazyeval

Tags:

r

dplyr

lazy-evaluation

I am having issues refactoring dplyr in a way that preserves non-standard evaluation. Lets say I want to create a function that always selects and renames.

library(lazyeval)
library(dplyr)

df <- data.frame(a = c(1,2,3), f = c(4,5,6), lm = c(7, 8 , 9))

select_happy<- function(df, col){
    col <- lazy(col)
    fo <- interp(~x, x=col)
    select_(df, happy=fo)
}

f <- function(){
    print('foo')
}

select_happy() is written according to the answer to this post Refactor R code when library functions use non-standard evaluation. select_happy() works on column names that are either undefined or defined in the global environment. However, it runs into issues when a column name is also the name of a function in another namespace.

select_happy(df, a)
#   happy
# 1     1
# 2     2
# 3     3

select_happy(df, f)
#   happy
# 1     4
# 2     5
# 3     6

select_happy(df, lm)
# Error in eval(expr, envir, enclos) (from #4) : object 'datafile' not found

environment(f)
# <environment: R_GlobalEnv>

environment(lm)
# <environment: namespace:stats>

Calling lazy() on f and lm shows a difference in the lazy object, where the function definition for lm is appearing in the lazy object, and for f it is just the name of the function.

lazy(f)
# <lazy>
#   expr: f
#   env:  <environment: R_GlobalEnv>

lazy(lm)
# <lazy>
#   expr: function (formula, data, subset, weights, na.action, method = "qr",  ...
#   env:  <environment: R_GlobalEnv>

substitute appears to work with lm.

 select_happy<- function(df, col){
     col <- substitute(col) # <- substitute() instead of lazy()
     fo <- interp(~x, x=col)
     select_(df, happy=fo)
}

select_happy(df, lm)
#   happy
# 1     7 
# 2     8
# 3     9

However, after reading the vignette on lazyeval it seems that lazy should serve as a superior substitute for substitute. Additionally, the regular select function works just fine.

select(df, happy=lm)
#   happy
# 1     7
# 2     8
# 3     9

My question is how can I write select_happy() so that it works in all the ways that select() does? I'm having a hard time wrapping my head around the scoping and non-standard evaluation. More generally, what would be a solid strategy for programming with dplyr that could avoid these and other issues?

Edit

I tested out docendo discimus's solution and it worked great, but I would like to know if there is a way to use arguments, rather than dots, for the function. I think it is also important to be able to use interp() because you might want to feed input into a more complicated formula, like in the post I linked to earlier. I think the core of the issue come down to the fact that lazy_dots() is capturing the expression differently from lazy(). I would like to understand why they are behaving differently, and how to use lazy() to get the same functionality as lazy_dots().

g <- function(...){
    lazy_dots(...)
}

h <-  function(x){
    lazy(x)
}

g(lm)[[1]]
# <lazy>
#   expr: lm
#   env:  <environment: R_GlobalEnv>
h(lm)
# <lazy>
#   expr: function (formula, data, subset, weights, na.action, method = "qr",  ...
#   env:  <environment: R_GlobalEnv>

Even changing .follow__symbols to FALSE for lazy() so that it is the same as lazy_dots() does not work.

lazy
# function (expr, env = parent.frame(), .follow_symbols = TRUE) 
# {
#     .Call(make_lazy, quote(expr), environment(), .follow_symbols)
# }
# <environment: namespace:lazyeval>

lazy_dots
# function (..., .follow_symbols = FALSE) 
# {
#     if (nargs() == 0) 
#         return(structure(list(), class = "lazy_dots"))
#     .Call(make_lazy_dots, environment(), .follow_symbols)
# }
# <environment: namespace:lazyeval>


h2 <-  function(x){
    lazy(x, .follow_symbols=FALSE)
}

h2(lm)
# <lazy>
#  expr: x
#  env:  <environment: 0xe4a42a8>

I just feel really kind of stuck as to what to do.

686

asked Feb 04 '16 08:02

Mir Henglin

1 Answers

One option may be to make write select_happy almost the same way as the standard select function:

select_happy<- function(df, ...){
  select_(df, .dots = setNames(lazy_dots(...), "happy"))
}

f <- function(){
  print('foo')
}

> select_happy(df, a)
  happy
1     1
2     2
3     3
> 
> select_happy(df, f)
  happy
1     4
2     5
3     6
> 
> select_happy(df, lm)
  happy
1     7
2     8
3     9

Note that the function definition of the standard select function is:

> select
function (.data, ...) 
{
    select_(.data, .dots = lazyeval::lazy_dots(...))
}
<environment: namespace:dplyr>

Also note that by this definition, select_happy accepts multiple columns to be selected, but will name any additional columns "NA":

> select_happy(df, lm, a)
  happy NA
1     7  1
2     8  2
3     9  3

Of course you could make some modifications for such cases, for example:

select_happy<- function(df, ...){
  dots <- lazy_dots(...)
  n <- length(dots)
  if(n == 1) newnames <- "happy" else newnames <- paste0("happy", seq_len(n))
  select_(df, .dots = setNames(dots, newnames))
}

> select_happy(df, f)
  happy
1     4
2     5
3     6

> select_happy(df, lm, a)
  happy1 happy2
1      7      1
2      8      2
3      9      3

100

answered Sep 29 '22 18:09

talat

Related questions
                            
                                Shiny: preventing initial error messages in endpoints while conductor executes
                            
                                Fast test if directory is empty
                            
                                Odd Behavior with Greedy Modifiers Inside Capture Groups
                            
                                Safely merge data frames by factor columns
                            
                                Clustering algorithm in R for missing categorical and numerical values
                            
                                User lookup on Twitter API from R results in error (403)
                            
                                how to pass arguments as a list in R function
                            
                                Making nicely formatted tables in Markdown: knitr not compiling stargazer>html table
                            
                                Read a CSV in R as a data.frame
                            
                                Quanstrat strategy - error
                            
                                Customising R markdown pdf document
                            
                                Possible to use knitr cache chunk in interactive rmarkdown doc?
                            
                                Using DiagrammeR in Word document (generated using rMarkdown)
                            
                                Is there a general inverse of the table() function?
                            
                                Create abbreviated legends manually for long X labels in ggplot2
                            
                                Error in gzfile(file, "rb") - what should I do?
                            
                                using ggsave and arrangeGrob after updating gridExtra to 2.0.0
                            
                                Manipulating NumericMatrix in Rcpp
                            
                                unable to install packages("caret") completely in R version 3.2.3
                            
                                Ggplot2 different alpha behaviour [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With