Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pass arguments to dplyr functions

I want to parameterise the following computation using dplyr that finds which values of Sepal.Length are associated with more than one value of Sepal.Width:

library(dplyr)  iris %>%     group_by(Sepal.Length) %>%     summarise(n.uniq=n_distinct(Sepal.Width)) %>%     filter(n.uniq > 1) 

Normally I would write something like this:

not.uniq.per.group <- function(data, group.var, uniq.var) {     iris %>%         group_by(group.var) %>%         summarise(n.uniq=n_distinct(uniq.var)) %>%         filter(n.uniq > 1) } 

However, this approach throws errors because dplyr uses non-standard evaluation. How should this function be written?

like image 452
asnr Avatar asked Jan 15 '15 23:01

asnr


2 Answers

You need to use the standard evaluation versions of the dplyr functions (just append '_' to the function names, ie. group_by_ & summarise_) and pass strings to your function, which you then need to turn into symbols. To parameterise the argument of summarise_, you will need to use interp(), which is defined in the lazyeval package. Concretely:

library(dplyr) library(lazyeval)  not.uniq.per.group <- function(df, grp.var, uniq.var) {     df %>%         group_by_(grp.var) %>%         summarise_( n_uniq=interp(~n_distinct(v), v=as.name(uniq.var)) ) %>%         filter(n_uniq > 1) }  not.uniq.per.group(iris, "Sepal.Length", "Sepal.Width") 

Note that in recent versions of dplyr the standard evaluation versions of the dplyr functions have been "soft deprecated" in favor of non-standard evaluation.

See the Programming with dplyr vignette for more information on working with non-standard evaluation.

like image 66
asnr Avatar answered Sep 23 '22 00:09

asnr


Like the old dplyr versions up to 0.5, the new dplyr has facilities for both standard evaluation (SE) and nonstandard evaluation (NSE). But they are expressed differently than before.

If you want an NSE function, you pass bare expressions and use enquo to capture them as quosures. If you want an SE function, just pass quosures (or symbols) directly, then unquote them in the dplyr calls. Here is the SE solution to the question:

library(tidyverse) library(rlang)  f1 <- function(df, grp.var, uniq.var) {    df %>%        group_by(!!grp.var) %>%        summarise(n_uniq = n_distinct(!!uniq.var)) %>%        filter(n_uniq > 1)   }  a <- f1(iris, quo(Sepal.Length), quo(Sepal.Width)) b <- f1(iris, sym("Sepal.Length"), sym("Sepal.Width")) identical(a, b) #> [1] TRUE 

Note how the SE version enables you to work with string arguments - just turn them into symbols first using sym(). For more information, see the programming with dplyr vignette.

like image 22
Paul Avatar answered Sep 26 '22 00:09

Paul