Hadley turned me on to the plyr package and I find myself using it all the time to do 'group by' sort of stuff. But I find myself having to always rename the resulting columns since they default to V1, V2, etc.
Here's an example:
mydata<-data.frame(matrix(rnorm(144, mean=2, sd=2),72,2),c(rep("A",24),rep("B",24),rep("C",24)))
colnames(mydata) <- c("x_value", "acres", "state")
groupAcres <- ddply(mydata, c("state"), function(df)c(sum(df$acres)))
colnames(groupAcres) <- c("state","stateAcres")
Is there a way to make ddply name the resulting column for me so I can omit that last line?
To rename a column in R, you can use the rename() function from dplyr. For example, if you want to rename the column “A” to “B” again, you can run the following code: rename(dataframe, B = A) .
plyr is an R package that makes it simple to split data apart, do stuff to it, and mash it back together. This is a common data-manipulation step. Importantly, plyr makes it easy to control the input and output data format from a syntactically consistent set of functions.
If you select option R, a panel is displayed to allow you to enter the new data set name. Type the new data set name and press Enter to rename, or enter the END command to cancel. Either action returns you to the previous panel.
Use summarise (or summarize):
groupAcres <- ddply(mydata, "state", summarise,
myName = sum(acres))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With