ddply aggregated column names

Question

I am using ddply to aggregate my data but haven't found an elegant way to assign column names to the output data frame.

At the moment I am doing this:

agg_data <- ddply(raw_data, .(id, date, classification), nrow)
names(agg_data)[4] <- "no_entries"

and this

agg_data <- ddply(agg_data, .(classification, date), colwise(mean, .(no_entries)) )
names(agg_data)[3] <- "avg_no_entries"

Is there a better, more elegant way to do this?

JD Long · Accepted Answer

The generic form I use a lot is:

 ddply(raw_data, .(id, date, classification), function(x) data.frame( no_entries=nrow(x) )

I use anonymous functions in my ddply statements almost all the time so the above idiom meshes well with anonymous functions. This is not the most concise way to express a function like nrow() but with functions where I pass multiple arguments, I like it a lot.

joran · Answer

You can use summarise:

agg_data <- ddply(raw_data, .(id, date, classification), summarise, "no_entries" = nrow(piece))

or you can use length(<column_name>) if nrow(piece) doesn't work. For instance, here's an example that should be runnable by anyone:

ddply(baseball, .(year), summarise, newColumn = nrow(piece))

or

ddply(baseball, .(year), summarise, newColumn = length(year))

EDIT

Or as Joshua comments, the all caps version, NROW does the checking for you.

ddply aggregated column names

Tags:

r

plyr

behas

2 Answers

JD Long

joran

Recent Activity

Donate For Us

ddply aggregated column names

Tags:

r

plyr

behas

2 Answers

JD Long

joran

Related questions

Recent Activity

Donate For Us