Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ddply aggregated column names

Tags:

r

plyr

I am using ddply to aggregate my data but haven't found an elegant way to assign column names to the output data frame.

At the moment I am doing this:

agg_data <- ddply(raw_data, .(id, date, classification), nrow)
names(agg_data)[4] <- "no_entries"

and this

agg_data <- ddply(agg_data, .(classification, date), colwise(mean, .(no_entries)) )
names(agg_data)[3] <- "avg_no_entries"

Is there a better, more elegant way to do this?

like image 399
behas Avatar asked Jul 28 '11 17:07

behas


2 Answers

The generic form I use a lot is:

 ddply(raw_data, .(id, date, classification), function(x) data.frame( no_entries=nrow(x) )

I use anonymous functions in my ddply statements almost all the time so the above idiom meshes well with anonymous functions. This is not the most concise way to express a function like nrow() but with functions where I pass multiple arguments, I like it a lot.

like image 65
JD Long Avatar answered Oct 15 '22 19:10

JD Long


You can use summarise:

agg_data <- ddply(raw_data, .(id, date, classification), summarise, "no_entries" = nrow(piece))

or you can use length(<column_name>) if nrow(piece) doesn't work. For instance, here's an example that should be runnable by anyone:

ddply(baseball, .(year), summarise, newColumn = nrow(piece))

or

ddply(baseball, .(year), summarise, newColumn = length(year))

EDIT

Or as Joshua comments, the all caps version, NROW does the checking for you.

like image 42
joran Avatar answered Oct 15 '22 18:10

joran