Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use ddply within a function and include variable of interest as an argument

Tags:

r

plyr

I am relatively new to R, and trying to use ddply & summarise from the plyr package. This post almost, but not quite, answers my question. I could use some additional explanation/clarification.

My problem:

I want to create a simple function to summarize descriptive statistics, by group, for a given variable. Unlike the linked post, I would like to include the variable of interest as an argument to the function. As has already been discussed on this site, this works:

require(plyr)

ddply(mtcars, ~ cyl, summarise,
  mean = mean(hp),
  sd   = sd(hp),
  min  = min(hp),
  max  = max(hp)
)

But this doesn't:

descriptives_by_group <- function(dataset, group, x)
{
  ddply(dataset, ~ group, summarise,
    mean = mean(x),
    sd   = sd(x),
    min  = min(x),
    max  = max(x)
  )
}

descriptives_by_group(mtcars, cyl, hp)

Because of the volume of data with which I am working, I would like to be able to have a function that allows me to specify the variable of interest to me as well as the dataset and grouping variable.

I have tried to edit the various solutions found here to address my problem, but I don't understand the code well enough to do it successfully.

The original poster used the following example dataset:

a = c(1,2,3,4)
b = c(0,0,1,1)
c = c(5,6,7,8)
df = data.frame(a,b,c)
sv = c("b")

With the desired output:

  b Ave
1 0 1.5
2 1 3.5

And the solution endorsed by Hadley was:

myFunction <- function(x, y){
NewColName <- "a"
z <- ddply(x, y, .fun = function(xx,col){
                         c(Ave = mean(xx[,col],na.rm=TRUE))}, 
           NewColName)
return(z)
}

Where myFunction(df, sv) returns the desired output.

I tried to break down the code piece-by-piece to see if, by getting a better understanding of the underlying mechanics, I could modify the code to include an argument to the function that would pass to what, in this example, is "NewColName" (the variable you want to get information about). But I am not having any success. My difficulty is that I do not understand what is happening with (xx[,col]). I know that mean(xx[,col]) should be taking the mean of the column with index col for the data frame xx. But I don't understand where the anonymous function is reading those values from.

Could someone please help me parse this? I've wasted hours on a trivial task I could accomplish easily with very repetitive code and/or with subsetting, but I got hung up on trying to make my script more simple and elegant, and on understanding the "whys" of this problem and its solution(s).

PS I have looked into the describeBy function from the psych package, but as far as I can tell, it does not let you specify the variable(s) you want to return values for, and consequently does not solve my problem.

like image 796
A Jack Avatar asked Aug 29 '13 16:08

A Jack


3 Answers

I just moved a couple things around in the example function you gave and showed how to get more than one column back out. Does this do what you want?

myFunction2 <- function(x, y, col){
z <- ddply(x, y, .fun = function(xx){
                         c(mean = mean(xx[,col],na.rm=TRUE),
                         max = max(xx[,col],na.rm=TRUE) ) })
return(z)
}

myFunction2(mtcars, "cyl", "hp")
like image 83
aosmith Avatar answered Nov 18 '22 17:11

aosmith


(More of a comment than an answer. I had the same level of difficulty as you when using ddply(...,summarise, ...) inside a function.) This is a base solution that worked the way I expected:

descriptives_by_group <- function(dataset, group, x)
  {aggregate(dataset[[x]], dataset[group], function(x)
      c(  mean = mean(x),
          sd   = sd(x),
          min  = min(x),
          max  = max(x)
         ) )
  }

descriptives_by_group(mtcars, 'cyl', 'hp')
like image 21
IRTFM Avatar answered Nov 18 '22 16:11

IRTFM


Just use as.quoted function. Example below

simple_ddply <- function(dataset_name, variable_name){
    data <- ddply(dataset_name,as.quoted(variable_name), *remaining input)**
like image 3
Anna Avatar answered Nov 18 '22 16:11

Anna