I would like to group data in a data.frame by two columns and then sum a specific third column. For example:
> aggregate(mpg~gear+cyl, data=mtcars, FUN=sum)
  gear cyl   mpg
1    3   4  21.5
2    4   4 215.4
3    5   4  56.4
4    3   6  39.5
5    4   6  79.0
6    5   6  19.7
7    3   8 180.6
8    5   8  30.8
Now, I need to do this several times for different columns. So I would like to write a function which generalizes this. It take the data.frame and one of the columns (to keep things simple) and does the same thing.
agg.data <- function(df, colname) {
  aggregate(mpg~gear+colname, data=df, FUN=sum) 
}
Running this will produce:
Error in eval(expr, envir, enclos) : object 'colname' not found
How can I pass in the value of colname to aggregate?
Paste together a string representation of your formula, and give that string as an argument to formula()...
agg.data <- function(df, colname) {
  aggregate(formula(paste0("mpg~gear+", colname)), data=df, FUN=sum) 
}
> agg.data(mtcars, "cyl")
  gear cyl   mpg
1    3   4  21.5
2    4   4 215.4
3    5   4  56.4
4    3   6  39.5
5    4   6  79.0
6    5   6  19.7
7    3   8 180.6
8    5   8  30.8
                        Using data.table:
fun.dt <- function(dt, col) {
    dt[, .(mpg=sum(mpg)), by=c("gear", col)]
}
require(data.table)
dt = as.data.table(mtcars)
fun.dt(dt, "cyl")
#    gear cyl   mpg
# 1:    4   6  79.0
# 2:    4   4 215.4
# 3:    3   6  39.5
# 4:    3   8 180.6
# 5:    3   4  21.5
# 6:    5   4  56.4
# 7:    5   8  30.8
# 8:    5   6  19.7
The by expression in data.tables can also take a character vector of column names in addition to lists of columns/expressions. We can simply provide a character vector to the by argument.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With