I'd like to apply a list of programatically selected functions to each column of a data frame using dplyr
. For illustration purposes, here is my list of functions:
fun_list <- lapply(iris[-5], function(x) if(var(x) > 0.7) median else mean)
I thought this would work:
iris %>% group_by(Species) %>% summarise_each_(funs_(fun_list), names(iris)[-5])
based on ?funs_
which states the arguments should be, among other things:
A list of functions specified by ... The function itself, mean
But this fails with error:
Error in UseMethod("as.lazy") :
no applicable method for 'as.lazy' applied to an object of class "function"
It seems that funs_
is actually expecting a list of symbols that correspond to functions defined in the appropriate environment, instead of actual functions. In my application though I only get the functions, not their symbol names (besides, the functions could well be anonymous).
Is there a way to pass the actual functions to summarise_each
with dplyr
? Note I'm specifically looking for a dplyr
answer as I know how to solve this problem with other tools.
If fun_list
is a list of functions, you can convert it to a list of "lazy objects" before using it in dplyr functions.
library(lazyeval)
fun_list2 <- lapply(fun_list, function(f) lazy(f(.)))
or
fun_list2 <- lapply(fun_list, function(f) lazy_(quote(f), env = environment()))
But I am not sure if this is a 100% waterproof method.
Based on comments (to have one function per column):
dispatch <- lazy_(quote((fun_list[[as.character(substitute(.))]](.))), env = environment())
iris %>% group_by(Species) %>% summarise_each_(funs_(dispatch), names(iris)[-5])
The idea is to use summarise_each_
but not with a list of functions but with a
single dispatch function. This function takes a variable, finds the right
function from original fun_list (by its name!) and uses the variable as input.
The solution works if the names of the functions list match the names of the variables.
It is also possible to define dispatch and function list dynamically (in this case the environment is not global):
get_dispatch <- function(fun_list) {
return(lazy_(quote((fun_list[[as.character(substitute(.))]](.))), env = environment()))
}
dispatch <- get_dispatch(lapply(iris[-5], function(x) if(var(x) > 0.7) median else mean))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With