Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use list of functions with dplyr::summarize_each_

Tags:

r

dplyr

I'd like to apply a list of programatically selected functions to each column of a data frame using dplyr. For illustration purposes, here is my list of functions:

fun_list <- lapply(iris[-5], function(x) if(var(x) > 0.7) median else mean)

I thought this would work:

iris %>% group_by(Species) %>% summarise_each_(funs_(fun_list), names(iris)[-5])

based on ?funs_ which states the arguments should be, among other things:

A list of functions specified by ... The function itself, mean

But this fails with error:

Error in UseMethod("as.lazy") : 
  no applicable method for 'as.lazy' applied to an object of class "function"

It seems that funs_ is actually expecting a list of symbols that correspond to functions defined in the appropriate environment, instead of actual functions. In my application though I only get the functions, not their symbol names (besides, the functions could well be anonymous).

Is there a way to pass the actual functions to summarise_each with dplyr? Note I'm specifically looking for a dplyr answer as I know how to solve this problem with other tools.

like image 883
BrodieG Avatar asked Apr 01 '15 22:04

BrodieG


1 Answers

If fun_list is a list of functions, you can convert it to a list of "lazy objects" before using it in dplyr functions.

library(lazyeval)

fun_list2 <- lapply(fun_list, function(f) lazy(f(.)))

or

fun_list2 <- lapply(fun_list, function(f) lazy_(quote(f), env = environment()))

But I am not sure if this is a 100% waterproof method.

Update

Based on comments (to have one function per column):

dispatch <- lazy_(quote((fun_list[[as.character(substitute(.))]](.))), env = environment())

iris %>% group_by(Species) %>% summarise_each_(funs_(dispatch), names(iris)[-5])

The idea is to use summarise_each_ but not with a list of functions but with a single dispatch function. This function takes a variable, finds the right function from original fun_list (by its name!) and uses the variable as input.

The solution works if the names of the functions list match the names of the variables.

It is also possible to define dispatch and function list dynamically (in this case the environment is not global):

get_dispatch <- function(fun_list) {
    return(lazy_(quote((fun_list[[as.character(substitute(.))]](.))), env = environment())) 
}

dispatch <- get_dispatch(lapply(iris[-5], function(x) if(var(x) > 0.7) median else mean))
like image 93
bergant Avatar answered Nov 15 '22 06:11

bergant