I would like to group data in a data.frame by two columns and then sum a specific third column. For example:
> aggregate(mpg~gear+cyl, data=mtcars, FUN=sum)
gear cyl mpg
1 3 4 21.5
2 4 4 215.4
3 5 4 56.4
4 3 6 39.5
5 4 6 79.0
6 5 6 19.7
7 3 8 180.6
8 5 8 30.8
Now, I need to do this several times for different columns. So I would like to write a function which generalizes this. It take the data.frame and one of the columns (to keep things simple) and does the same thing.
agg.data <- function(df, colname) {
aggregate(mpg~gear+colname, data=df, FUN=sum)
}
Running this will produce:
Error in eval(expr, envir, enclos) : object 'colname' not found
How can I pass in the value of colname to aggregate?
Paste together a string representation of your formula, and give that string as an argument to formula()...
agg.data <- function(df, colname) {
aggregate(formula(paste0("mpg~gear+", colname)), data=df, FUN=sum)
}
> agg.data(mtcars, "cyl")
gear cyl mpg
1 3 4 21.5
2 4 4 215.4
3 5 4 56.4
4 3 6 39.5
5 4 6 79.0
6 5 6 19.7
7 3 8 180.6
8 5 8 30.8
Using data.table
:
fun.dt <- function(dt, col) {
dt[, .(mpg=sum(mpg)), by=c("gear", col)]
}
require(data.table)
dt = as.data.table(mtcars)
fun.dt(dt, "cyl")
# gear cyl mpg
# 1: 4 6 79.0
# 2: 4 4 215.4
# 3: 3 6 39.5
# 4: 3 8 180.6
# 5: 3 4 21.5
# 6: 5 4 56.4
# 7: 5 8 30.8
# 8: 5 6 19.7
The by
expression in data.tables can also take a character vector of column names in addition to lists of columns/expressions. We can simply provide a character vector to the by
argument.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With