Name columns within aggregate in R

Update

This update is to just help get you started on deriving a solution on your own.

If you inspect the code for stats:::aggregate.formula, you'll see the following lines towards the end:

if (is.matrix(mf[[1L]])) {
    lhs <- as.data.frame(mf[[1L]])
    names(lhs) <- as.character(m[[2L]][[2L]])[-1L]
    aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...)
}
else aggregate.data.frame(mf[1L], mf[-1L], FUN = FUN, ...)

If all that you want to do is append the function name to the variable that was aggregated, perhaps you can change that to something like:

if (is.matrix(mf[[1L]])) {
  lhs <- as.data.frame(mf[[1L]])
  names(lhs) <- as.character(m[[2L]][[2L]])[-1L]
  myOut <- aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...)
  colnames(myOut) <- c(names(mf[-1L]), 
                       paste(names(lhs), deparse(substitute(FUN)), sep = "."))
}
else {
  myOut <- aggregate.data.frame(mf[1L], mf[-1L], FUN = FUN, ...)
  colnames(myOut) <- c(names(mf[-1L]), 
                       paste(strsplit(gsub("cbind\\(|\\)|\\s", "", 
                                           names(mf[1L])), ",")[[1]],
                             deparse(substitute(FUN)), sep = "."))
} 
myOut

This basically captures the value entered for FUN by using deparse(substitute(FUN)), so you can probably modify the function to accept a custom suffix, or perhaps even a vector of suffixes. This can probably be improved a bit with some work, but I'm not going to do it!

Here is a Gist with this concept applied, creating a function named "myAgg".

Here is some sample output of just the resulting column names:

> names(myAgg(weight ~ feed, data = chickwts, mean))
[1] "feed"        "weight.mean"
> names(myAgg(breaks ~ wool + tension, data = warpbreaks, sum))
[1] "wool"       "tension"    "breaks.sum"
> names(myAgg(weight ~ feed, data = chickwts, FUN = function(x) mean(x^2)))
[1] "feed"                         "weight.function(x) mean(x^2)"

Notice that only the aggregated variable name changes. But notice also that if you use a custom function, you'll end up with a really strange column name!

The answer to your first question is yes. You can certainly include the column names in the aggregate function. Using the names from your example above:

blubb <- aggregate(dat,list(One=dat$One,Two=dat$Two),sum)

I like the part about possibly pulling in the original column names automatically. If I figure it out I'll post it.

In case you prefer writing aggregates as formula the documentation shows the usage of cbind. And cbind allows you to name its arguments, which are used by aggregate.

aggregate(cbind(SLength = Sepal.Length) ~ cbind(Type = Species),
  data = iris, mean)
#  Type SLength
#1    1   5.006
#2    2   5.936
#3    3   6.588

But cbind replaces factors by their internal codes. To avoid this you can use:

aggregate(SLength ~ Type, with(iris, data.frame(SLength = Sepal.Length,
  Type = Species)), mean)
#        Type SLength
#1     setosa   5.006
#2 versicolor   5.936
#3  virginica   6.588

with(iris, aggregate(data.frame(SLength = Sepal.Length),
  data.frame(Type = Species), mean))
#        Type SLength
#1     setosa   5.006
#2 versicolor   5.936
#3  virginica   6.588

aggregate(data.frame(SLength = iris$Sepal.Length),
  data.frame(Type = iris$Species), mean)
#        Type SLength
#1     setosa   5.006
#2 versicolor   5.936
#3  virginica   6.588

The advantage of using cbind or data.frame compared to list is that not all columns need to be given a (new) name. Aggregation of more than one column by more than one grouping factor could be done like:

aggregate(cbind("Miles/gallon" = mpg, Weight = wt, hp) ~ cbind(Cylinders =
  cyl) + cbind(Carburetors = carb) + gear, data = mtcars, mean)
#   Cylinders Carburetors gear Miles/gallon  Weight    hp
#1          4           1    3        21.50 2.46500  97.0
#2          6           1    3        19.75 3.33750 107.5
#...

and if you want to use more than one function:

aggregate(cbind(cases=ncases, ncontrols) ~ cbind(alc=alcgp) + tobgp,
  data = esoph, FUN = function(x) c("mean" = mean(x), "median" = median(x)))

#   alc    tobgp cases.mean cases.median ncontrols.mean ncontrols.median
#1    1 0-9g/day  1.5000000    1.0000000      43.500000        47.000000
#2    2 0-9g/day  5.6666667    4.0000000      29.833333        34.500000
#...

which adds to the colname the used aggregate-function.

Hera again cbind replaces factors by their internal codes. To avoid this you can use:

with(esoph, aggregate(data.frame(cases=ncases, ncontrols),
 data.frame(alc=alcgp, tobgp),
 FUN = function(x) c("mean" = mean(x), "median" = median(x))))
#         alc    tobgp cases.mean cases.median ncontrols.mean ncontrols.median
#1  0-39g/day 0-9g/day  1.5000000    1.0000000      43.500000        47.000000
#2      40-79 0-9g/day  5.6666667    4.0000000      29.833333        34.500000
#...

Related questions
                            
                                Manually setting group colors for ggplot2
                            
                                alternative to "!is.null()" in R
                            
                                extract hours and seconds from POSIXct for plotting purposes in R
                            
                                ggplot2 two-line label with expression
                            
                                The R console is in my native language, how can I set R to English?
                            
                                How to install development version of R packages github repository
                            
                                Getting a function name as a string
                            
                                Find row and column index of maximum value in a matrix [duplicate]
                            
                                How to change the default time zone in R?
                            
                                if else condition in ggplot to add an extra layer
                            
                                Install udunits2 package for R3.3
                            
                                Subset and ggplot2
                            
                                Convert factor to integer [duplicate]
                            
                                How do I change the number of decimal places on axis labels in ggplot2?
                            
                                Delete rows containing specific strings in R
                            
                                Why does NaN^0 == 1
                            
                                R: sourcing files using a relative path
                            
                                Modifying fonts in ggplot2
                            
                                Prevent unlist to drop NULL values
                            
                                When using "geom_histogram" there is error "unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0". Why

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Name columns within aggregate in R

Tags:

r

rename

aggregate

People also ask

Update

Recent Activity

Donate For Us