Non standard evaluation is really handy when using dplyr's verbs. But it can be problematic when using those verbs with function arguments. For example let us say that I want to create a function that gives me the number of rows for a given species.
# Load packages and prepare data
library(dplyr)
library(lazyeval)
# I prefer lowercase column names
names(iris) <- tolower(names(iris))
# Number of rows for all species
nrow(iris)
# [1] 150
This function doesn't work as expected because species
is interpreted in the context of the iris data frame
instead of being interpreted in the context of the
function argument:
nrowspecies0 <- function(dtf, species){
dtf %>%
filter(species == species) %>%
nrow()
}
nrowspecies0(iris, species = "versicolor")
# [1] 150
To work around non standard evaluation, I usually append the argument with an underscore :
nrowspecies1 <- function(dtf, species_){
dtf %>%
filter(species == species_) %>%
nrow()
}
nrowspecies1(iris, species_ = "versicolor")
# [1] 50
# Because of function name completion the argument
# species works too
nrowspecies1(iris, species = "versicolor")
# [1] 50
It is not completely satisfactory since it changes the name of the function argument to something less user friendly. Or it relies on autocompletion which I'm afraid is not a good practice for programming. To keep a nice argument name, I could do :
nrowspecies2 <- function(dtf, species){
species_ <- species
dtf %>%
filter(species == species_) %>%
nrow()
}
nrowspecies2(iris, species = "versicolor")
# [1] 50
Another way to work around non standard evaluation
based on this answer.
interp()
interprets species
in the context of the
function environment:
nrowspecies3 <- function(dtf, species){
dtf %>%
filter_(interp(~species == with_species,
with_species = species)) %>%
nrow()
}
nrowspecies3(iris, species = "versicolor")
# [1] 50
Considering the 3 function above, what is the preferred - most robust - way to implement this filter function? Are there any other ways?
dplyr utilizes pipe operator from another package (magrittr). It allows you to write sub-queries like we do it in sql. Note : All the functions in dplyr package can be used without the pipe operator.
Adding Arguments in R We can pass an argument to a function while calling the function by simply giving the value as an argument inside the parenthesis.
across() , relocate() , rename() , select() , and pull() use tidy selection so you can easily choose variables based on their position, name, or type (e.g. starts_with("x") or is.
The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [ .
The answer from @eddi is correct about what's going on here.
I'm writing another answer that addresses the larger request of how to write functions using dplyr
verbs. You'll note that, ultimately, it uses something like nrowspecies2
to avoid the species == species
tautology.
To write a function wrapping dplyr verb(s) that will work with NSE, write two functions:
First write a version that requires quoted inputs, using lazyeval
and
an SE version of the dplyr
verb. So in this case, filter_
.
nrowspecies_robust_ <- function(data, species){
species_ <- lazyeval::as.lazy(species)
condition <- ~ species == species_ # *
tmp <- dplyr::filter_(data, condition) # **
nrow(tmp)
}
nrowspecies_robust_(iris, ~versicolor)
Second make a version that uses NSE:
nrowspecies_robust <- function(data, species) {
species <- lazyeval::lazy(species)
nrowspecies_robust_(data, species)
}
nrowspecies_robust(iris, versicolor)
* = if you want to do something more complex, you may need to use lazyeval::interp
here as in the tips linked below
** = also, if you need to change output names, see the .dots
argument
For the above, I followed some tips from Hadley
Another good resource is the dplyr vignette on NSE, which illustrates .dots
, interp
, and other functions from the lazyeval
package
For even more details on lazyeval see it's vignette
For a thorough discussion of the base R tools for working with NSE (many of which lazyeval
helps you avoid), see the chapter on NSE in Advanced R
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With