Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pass function to ddply wrapped inside a function as part of that functions call

Tags:

r

plyr

I am hoping to use ddply within a function to summarise groups based on a user determined summary statistic (e.g. the mean, median, min, max), by passing the name of the summary function to apply as a variable in the function call. However, I'm not sure how to pass this to ddply.

Simple e.g.

library(plyr)
test.df<-data.frame(group=c("a","a","b","b"),value=c(1,5,5,15))
ddply(test.df,.(group),summarise, mean=mean(value, na.rm=TRUE))

how could I set this up something like below, with the relevant function passed to ddply (additionally within a function of course, although this should be straightforward once the first problem is solved). Note each summary measure (mean etc.), will require na.rm=TRUE. I could do this by writing my own replacement function for each summary statistic, but this seems overly complex.

Desired:

#fn<-"mean"     
#ddply(test.df,.(group),summarise, fn=fn(value, na.rm=TRUE))

Thanks for any help people can provide.

EDIT! Thanks all for these responses. I initially thought leaving out the quotes was working, however that approach, nor the use of getFunction or match.fun work once fn is specific as part of a function call. What I'm actually hoping to get working is something along the lines of the code below (which returns an error). Apologies for not providing a more thorough example in the first instance...

test.df<-data.frame(group=c("a","a","b","b"),value=c(1,5,5,15))
my.fun <- function(df, fn="mean") {
    summary <- ddply(df,.(group),summarise, summary=match.fun(fn)(value, na.rm=T))
  return(summary)
}
my.fun(test.df, fn="mean")
like image 791
nickb Avatar asked Mar 22 '23 00:03

nickb


1 Answers

The function that you provided in the question looks like it should work. (And indeed it took me a few moment to remember why it wouldn't). Here it is again, slightly rewritten for clarity (Iwastemptedtoansweryourquestionwithoutanyspacesiniteither;)

df <- data.frame(
  group = c("a", "a" ,"b" ,"b" ), 
  value = c(1, 5, 5, 15)
)

my_fun <- function(df, fn = "mean") {
  fn <- match.fun(fn)
  ddply(df, .(group), summarise, summary = fn(value, na.rm = TRUE))
}

The reason it doesn't work is a little subtle but comes down to how scoping (the process of looking up the values of variables from their names) works. summarise() uses non-standard evaluation to look up values in data frame, and the environment from which it was called. That works for value, but not for fn because it's not present where summarise() is called, i.e. in ddply().

There are two solutions:

  1. Use the here() function which was added to plyr to work around this problem

    my_fun <- function(df, fn = "mean") {
      fn <- match.fun(fn)
      ddply(df, .(group), here(summarise), summary = fn(value, na.rm = TRUE))
    }
    my_fun(df, "mean")
    
  2. Be slightly less concise and use an explicit function:

    my_fun <- function(df, fn = "mean") {
      fn <- match.fun(fn)
      ddply(df, .(group), function(df) {
        summarise(df, summary = fn(value, na.rm = TRUE))
      })
    }
    my_fun(df, "mean")
    

I now understand how I could have avoided this problem in the first place in the design of plyr, but it requires some custom C/C++ code. It's fixed in dplyr but is unlikely to be ported back to plyr because it might break existing code.

like image 188
hadley Avatar answered Apr 07 '23 07:04

hadley