I am having issues refactoring dplyr in a way that preserves non-standard evaluation. Lets say I want to create a function that always selects and renames.
library(lazyeval)
library(dplyr)
df <- data.frame(a = c(1,2,3), f = c(4,5,6), lm = c(7, 8 , 9))
select_happy<- function(df, col){
col <- lazy(col)
fo <- interp(~x, x=col)
select_(df, happy=fo)
}
f <- function(){
print('foo')
}
select_happy()
is written according to the answer to this post Refactor R code when library functions use non-standard evaluation. select_happy()
works on column names that are either undefined or defined in the global environment. However, it runs into issues when a column name is also the name of a function in another namespace.
select_happy(df, a)
# happy
# 1 1
# 2 2
# 3 3
select_happy(df, f)
# happy
# 1 4
# 2 5
# 3 6
select_happy(df, lm)
# Error in eval(expr, envir, enclos) (from #4) : object 'datafile' not found
environment(f)
# <environment: R_GlobalEnv>
environment(lm)
# <environment: namespace:stats>
Calling lazy()
on f and lm shows a difference in the lazy object, where the function definition for lm is appearing in the lazy object, and for f it is just the name of the function.
lazy(f)
# <lazy>
# expr: f
# env: <environment: R_GlobalEnv>
lazy(lm)
# <lazy>
# expr: function (formula, data, subset, weights, na.action, method = "qr", ...
# env: <environment: R_GlobalEnv>
substitute
appears to work with lm.
select_happy<- function(df, col){
col <- substitute(col) # <- substitute() instead of lazy()
fo <- interp(~x, x=col)
select_(df, happy=fo)
}
select_happy(df, lm)
# happy
# 1 7
# 2 8
# 3 9
However, after reading the vignette on lazyeval
it seems that lazy
should serve as a superior substitute for substitute
. Additionally, the regular select
function works just fine.
select(df, happy=lm)
# happy
# 1 7
# 2 8
# 3 9
My question is how can I write select_happy()
so that it works in all the ways that select()
does? I'm having a hard time wrapping my head around the scoping and non-standard evaluation. More generally, what would be a solid strategy for programming with dplyr that could avoid these and other issues?
Edit
I tested out docendo discimus's solution and it worked great, but I would like to know if there is a way to use arguments, rather than dots, for the function. I think it is also important to be able to use interp()
because you might want to feed input into a more complicated formula, like in the post I linked to earlier. I think the core of the issue come down to the fact that lazy_dots()
is capturing the expression differently from lazy()
. I would like to understand why they are behaving differently, and how to use lazy()
to get the same functionality as lazy_dots()
.
g <- function(...){
lazy_dots(...)
}
h <- function(x){
lazy(x)
}
g(lm)[[1]]
# <lazy>
# expr: lm
# env: <environment: R_GlobalEnv>
h(lm)
# <lazy>
# expr: function (formula, data, subset, weights, na.action, method = "qr", ...
# env: <environment: R_GlobalEnv>
Even changing .follow__symbols
to FALSE
for lazy()
so that it is the same as lazy_dots()
does not work.
lazy
# function (expr, env = parent.frame(), .follow_symbols = TRUE)
# {
# .Call(make_lazy, quote(expr), environment(), .follow_symbols)
# }
# <environment: namespace:lazyeval>
lazy_dots
# function (..., .follow_symbols = FALSE)
# {
# if (nargs() == 0)
# return(structure(list(), class = "lazy_dots"))
# .Call(make_lazy_dots, environment(), .follow_symbols)
# }
# <environment: namespace:lazyeval>
h2 <- function(x){
lazy(x, .follow_symbols=FALSE)
}
h2(lm)
# <lazy>
# expr: x
# env: <environment: 0xe4a42a8>
I just feel really kind of stuck as to what to do.
All of the dplyr functions take a data frame (or tibble) as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr. x %>% f(y) turns into f(x, y) so the result from one step is then “piped” into the next step.
dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges: mutate() adds new variables that are functions of existing variables. select() picks variables based on their names. filter() picks cases based on their values.
dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.
This article will cover the five verbs of dplyr: select, filter, arrange, mutate, and summarize.
One option may be to make write select_happy
almost the same way as the standard select
function:
select_happy<- function(df, ...){
select_(df, .dots = setNames(lazy_dots(...), "happy"))
}
f <- function(){
print('foo')
}
> select_happy(df, a)
happy
1 1
2 2
3 3
>
> select_happy(df, f)
happy
1 4
2 5
3 6
>
> select_happy(df, lm)
happy
1 7
2 8
3 9
Note that the function definition of the standard select
function is:
> select
function (.data, ...)
{
select_(.data, .dots = lazyeval::lazy_dots(...))
}
<environment: namespace:dplyr>
Also note that by this definition, select_happy
accepts multiple columns to be selected, but will name any additional columns "NA":
> select_happy(df, lm, a)
happy NA
1 7 1
2 8 2
3 9 3
Of course you could make some modifications for such cases, for example:
select_happy<- function(df, ...){
dots <- lazy_dots(...)
n <- length(dots)
if(n == 1) newnames <- "happy" else newnames <- paste0("happy", seq_len(n))
select_(df, .dots = setNames(dots, newnames))
}
> select_happy(df, f)
happy
1 4
2 5
3 6
> select_happy(df, lm, a)
happy1 happy2
1 7 1
2 8 2
3 9 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With