I have some R code that looks like this:
library(dplyr)
library(datasets)
iris %.% group_by(Species) %.% filter(rank(Petal.Length, ties.method = 'random')<=2) %.% ungroup()
Giving:
Source: local data frame [6 x 5]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.3 3.0 1.1 0.1 setosa
2 4.6 3.6 1.0 0.2 setosa
3 5.0 2.3 3.3 1.0 versicolor
4 5.1 2.5 3.0 1.1 versicolor
5 4.9 2.5 4.5 1.7 virginica
6 6.0 3.0 4.8 1.8 virginica
This groups by species, and for each group keeps only the two with the shortest Petal.Length. I have some duplication in my code, because I do this several times for different columns and numbers. E.g.:
iris %.% group_by(Species) %.% filter(rank(Petal.Length, ties.method = 'random')<=2) %.% ungroup()
iris %.% group_by(Species) %.% filter(rank(-Petal.Length, ties.method = 'random')<=2) %.% ungroup()
iris %.% group_by(Species) %.% filter(rank(Petal.Width, ties.method = 'random')<=3) %.% ungroup()
iris %.% group_by(Species) %.% filter(rank(-Petal.Width, ties.method = 'random')<=3) %.% ungroup()
I want to extract this into a function. The naive approach doesn't work:
keep_min_n_by_species <- function(expr, n) {
iris %.% group_by(Species) %.% filter(rank(expr, ties.method = 'random') <= n) %.% ungroup()
}
keep_min_n_by_species(Petal.Width, 2)
Error in filter_impl(.data, dots(...), environment()) :
object 'Petal.Width' not found
As I understand it, the expression rank(Petal.Length, ties.method = 'random') <= 2
is evaluated in a different context, introduced by the filter
function, that provides a meaning for the Petal.Length
expression. I can't just swap in a variable for Petal.Length, because it will be evaluated in the wrong context. I've tried using different combinations of substitute
and eval
, having read this page: Non-standard evaluation. I can't figure out an appropriate combination. I think the problem might be that I don't just want to pass through an expression from the caller (Petal.Length
) through to filter
for it to evaluate - I want to construct a new bigger expression (rank(Petal.Length, ties.method = 'random') <= 2
) and then pass that whole expression through to filter
for it to evaluate.
dplyr
version 0.3 is beginning to address this using the lazyeval
package, as @baptiste mentioned, and a new family of functions that use standard evaluation (same function names as the NSE versions, but ending in _
). There is a vignette here: https://github.com/hadley/dplyr/blob/master/vignettes/nse.Rmd
All that being said, I don't know best practices for what you're trying to do (though I'm trying to do the same thing). I have something working, but like I said, I don't know if it's the best way to do it. Note the use of filter_()
instead of filter()
, and passing in the argument as a quoted character string:
devtools::install_github("hadley/dplyr")
devtools::install_github("hadley/lazyeval")
library(dplyr)
library(lazyeval)
keep_min_n_by_species <- function(expr, n, rev = FALSE) {
iris %>%
group_by(Species) %>%
filter_(interp(~rank(if (rev) -x else x, ties.method = 'random') <= y, # filter_, not filter
x = as.name(expr), y = n)) %>%
ungroup()
}
keep_min_n_by_species("Petal.Width", 3) # "Petal.Width" as character string
keep_min_n_by_species("Petal.Width", 3, rev = TRUE)
Update based on @hadley's comment:
keep_min_n_by_species <- function(expr, n) {
expr <- lazy(expr)
formula <- interp(~rank(x, ties.method = 'random') <= y,
x = expr, y = n)
iris %>%
group_by(Species) %>%
filter_(formula) %>%
ungroup()
}
keep_min_n_by_species(Petal.Width, 3)
keep_min_n_by_species(-Petal.Width, 3)
How about
keep_min_n_by_species <- function(expr, n) {
mc <- match.call()
fx <- bquote(rank(.(mc$expr), ties.method = 'random') <= .(mc$n))
iris %.% group_by(Species) %.% filter(fx) %.% ungroup()
}
That seems to allow all the statements to run without error
keep_min_n_by_species(Petal.Width, 2)
keep_min_n_by_species(-Petal.Width, 2)
keep_min_n_by_species(Petal.Width, 3)
keep_min_n_by_species(-Petal.Width, 3)
The idea is that we use match.call()
to capture the unevaluated expressions passed to the function. Then we use bquote()
to build the filter as a call object.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With