Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using ddply inside a function

Tags:

r

plyr

I'm trying to make a function using ddply inside of it. However I can't get to work. This is a dummy example reproducing what I get. Does this have anything to do this bug?

library(ggplot2)
data(diamonds)

foo <- function(data, fac1, fac2, bar) {
  res <- ddply(data, .(fac1, fac2), mean(bar))
  res
}

foo(diamonds, "color", "cut", "price")
like image 663
Luciano Selzer Avatar asked Jul 05 '11 14:07

Luciano Selzer


Video Answer


2 Answers

I don't believe this is a bug. ddply expects the name of a function, which you haven't really supplied with mean(bar). You need to write a complete function that calculates the mean you'd like:

foo <- function(data, fac1, fac2, bar) {
  res <- ddply(data, c(fac1, fac2), function(x,ind){
                                     mean(x[,ind]},bar)
  res
}

Also, you shouldn't pass strings to .(), so I changed that to c(), so that you can pass the function arguments directly to ddply.

like image 200
joran Avatar answered Oct 05 '22 10:10

joran


There are quite a few things wrong with your code, but the main issue is: you are passing column names as character strings.

Just doing a 'find-and-replace' with your parameters within the function yields:

res <- ddply(diamonds, .("color", "cut"), mean("price"))

If you understand how ddply works (I kind of doubt this, given the rest of the code), you will understand that this is not supposed to work: ignoring the error in the last part (the function), this should be (notice the lack of quotes: the .() notation is nothing more than plyr's way of providing the quotes):

res <- ddply(diamonds, .(color, cut), mean(price))

Fortunately, ddplyalso supports passing its second argument as a vector of characters, i.e. the names of the columns, so (once again disregarding issues with the last parameter), this should become:

foo <- function(data, facs, bar) {
  res <- ddply(data, facs, mean(bar))
  res
}

foo(diamonds, c("color", "cut"), "price")

Finally: the function you pass to ddply should be a function that takes as its first argument a data.frame, which will each time hold the part of you passed along data.frame (diamonds) for the current values of color and cut. mean("price") or mean(price) are neither. If you insist on using ddply, here's what you need to do:

foo <- function(data, facs, bar) {
  res <- ddply(data, facs, function(dfr, colnm){mean(dfr[,colnm])}, bar)
  res
}
foo(diamonds, c("color", "cut"), "price")
like image 22
Nick Sabbe Avatar answered Oct 05 '22 10:10

Nick Sabbe