I am using ddply to aggregate my data but haven't found an elegant way to assign column names to the output data frame.
At the moment I am doing this:
agg_data <- ddply(raw_data, .(id, date, classification), nrow)
names(agg_data)[4] <- "no_entries"
and this
agg_data <- ddply(agg_data, .(classification, date), colwise(mean, .(no_entries)) )
names(agg_data)[3] <- "avg_no_entries"
Is there a better, more elegant way to do this?
The generic form I use a lot is:
ddply(raw_data, .(id, date, classification), function(x) data.frame( no_entries=nrow(x) )
I use anonymous functions in my ddply
statements almost all the time so the above idiom meshes well with anonymous functions. This is not the most concise way to express a function like nrow()
but with functions where I pass multiple arguments, I like it a lot.
You can use summarise
:
agg_data <- ddply(raw_data, .(id, date, classification), summarise, "no_entries" = nrow(piece))
or you can use length(<column_name>)
if nrow(piece)
doesn't work. For instance, here's an example that should be runnable by anyone:
ddply(baseball, .(year), summarise, newColumn = nrow(piece))
or
ddply(baseball, .(year), summarise, newColumn = length(year))
EDIT
Or as Joshua comments, the all caps version, NROW
does the checking for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With