Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate by string column name in R

Tags:

r

I would like to group data in a data.frame by two columns and then sum a specific third column. For example:

> aggregate(mpg~gear+cyl, data=mtcars, FUN=sum)
  gear cyl   mpg
1    3   4  21.5
2    4   4 215.4
3    5   4  56.4
4    3   6  39.5
5    4   6  79.0
6    5   6  19.7
7    3   8 180.6
8    5   8  30.8

Now, I need to do this several times for different columns. So I would like to write a function which generalizes this. It take the data.frame and one of the columns (to keep things simple) and does the same thing.

agg.data <- function(df, colname) {
  aggregate(mpg~gear+colname, data=df, FUN=sum) 
}

Running this will produce:

Error in eval(expr, envir, enclos) : object 'colname' not found

How can I pass in the value of colname to aggregate?

like image 536
oneself Avatar asked Dec 05 '22 22:12

oneself


2 Answers

Paste together a string representation of your formula, and give that string as an argument to formula()...

agg.data <- function(df, colname) {
  aggregate(formula(paste0("mpg~gear+", colname)), data=df, FUN=sum) 
}

> agg.data(mtcars, "cyl")
  gear cyl   mpg
1    3   4  21.5
2    4   4 215.4
3    5   4  56.4
4    3   6  39.5
5    4   6  79.0
6    5   6  19.7
7    3   8 180.6
8    5   8  30.8
like image 101
cory Avatar answered Dec 29 '22 20:12

cory


Using data.table:

fun.dt <- function(dt, col) {
    dt[, .(mpg=sum(mpg)), by=c("gear", col)]
}

require(data.table)
dt = as.data.table(mtcars)
fun.dt(dt, "cyl")
#    gear cyl   mpg
# 1:    4   6  79.0
# 2:    4   4 215.4
# 3:    3   6  39.5
# 4:    3   8 180.6
# 5:    3   4  21.5
# 6:    5   4  56.4
# 7:    5   8  30.8
# 8:    5   6  19.7

The by expression in data.tables can also take a character vector of column names in addition to lists of columns/expressions. We can simply provide a character vector to the by argument.

like image 39
Arun Avatar answered Dec 29 '22 21:12

Arun