Method 1: Using setNames() method The setNames() method is used to specify the name of an object and then return the object. In case of data frame, the columns can be renamed with new names, using the c() method.
To rename a column in R you can use the rename() function from dplyr. For example, if you want to rename the column “A” to “B”, again, you can run the following code: rename(dataframe, B = A) .
To change multiple column names by name and by index use rename() function of the dplyr package and to rename by just name use setnames() from data. table . From R base functionality, we have colnames() and names() functions that can be used to rename a data frame column by a single index or name.
We can use the aggregate() function in R to produce summary statistics for one or more variables in a data frame. where: sum_var: The variable to summarize.
You can use setNames
as in:
blubb <- setNames(aggregate(dat$two ~ dat$one, ...), c("One", "Two"))
Alternatively, you can bypass the slick formula method, and use syntax like:
blubb <- aggregate(list(One = dat$one), list(Two = dat$two), ...)
This update is to just help get you started on deriving a solution on your own.
If you inspect the code for stats:::aggregate.formula
, you'll see the following lines towards the end:
if (is.matrix(mf[[1L]])) {
lhs <- as.data.frame(mf[[1L]])
names(lhs) <- as.character(m[[2L]][[2L]])[-1L]
aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...)
}
else aggregate.data.frame(mf[1L], mf[-1L], FUN = FUN, ...)
If all that you want to do is append the function name to the variable that was aggregated, perhaps you can change that to something like:
if (is.matrix(mf[[1L]])) {
lhs <- as.data.frame(mf[[1L]])
names(lhs) <- as.character(m[[2L]][[2L]])[-1L]
myOut <- aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...)
colnames(myOut) <- c(names(mf[-1L]),
paste(names(lhs), deparse(substitute(FUN)), sep = "."))
}
else {
myOut <- aggregate.data.frame(mf[1L], mf[-1L], FUN = FUN, ...)
colnames(myOut) <- c(names(mf[-1L]),
paste(strsplit(gsub("cbind\\(|\\)|\\s", "",
names(mf[1L])), ",")[[1]],
deparse(substitute(FUN)), sep = "."))
}
myOut
This basically captures the value entered for FUN
by using deparse(substitute(FUN))
, so you can probably modify the function to accept a custom suffix, or perhaps even a vector of suffixes. This can probably be improved a bit with some work, but I'm not going to do it!
Here is a Gist with this concept applied, creating a function named "myAgg".
Here is some sample output of just the resulting column names:
> names(myAgg(weight ~ feed, data = chickwts, mean))
[1] "feed" "weight.mean"
> names(myAgg(breaks ~ wool + tension, data = warpbreaks, sum))
[1] "wool" "tension" "breaks.sum"
> names(myAgg(weight ~ feed, data = chickwts, FUN = function(x) mean(x^2)))
[1] "feed" "weight.function(x) mean(x^2)"
Notice that only the aggregated variable name changes. But notice also that if you use a custom function, you'll end up with a really strange column name!
The answer to your first question is yes. You can certainly include the column names in the aggregate function. Using the names from your example above:
blubb <- aggregate(dat,list(One=dat$One,Two=dat$Two),sum)
I like the part about possibly pulling in the original column names automatically. If I figure it out I'll post it.
In case you prefer writing aggregates as formula
the documentation shows the usage of cbind
. And cbind
allows you to name its arguments, which are used by aggregate
.
aggregate(cbind(SLength = Sepal.Length) ~ cbind(Type = Species),
data = iris, mean)
# Type SLength
#1 1 5.006
#2 2 5.936
#3 3 6.588
But cbind
replaces factors
by their internal codes. To avoid this you can use:
aggregate(SLength ~ Type, with(iris, data.frame(SLength = Sepal.Length,
Type = Species)), mean)
# Type SLength
#1 setosa 5.006
#2 versicolor 5.936
#3 virginica 6.588
or
with(iris, aggregate(data.frame(SLength = Sepal.Length),
data.frame(Type = Species), mean))
# Type SLength
#1 setosa 5.006
#2 versicolor 5.936
#3 virginica 6.588
or
aggregate(data.frame(SLength = iris$Sepal.Length),
data.frame(Type = iris$Species), mean)
# Type SLength
#1 setosa 5.006
#2 versicolor 5.936
#3 virginica 6.588
The advantage of using cbind
or data.frame
compared to list
is that not all columns need to be given a (new) name. Aggregation of more than one column by more than one grouping factor could be done like:
aggregate(cbind("Miles/gallon" = mpg, Weight = wt, hp) ~ cbind(Cylinders =
cyl) + cbind(Carburetors = carb) + gear, data = mtcars, mean)
# Cylinders Carburetors gear Miles/gallon Weight hp
#1 4 1 3 21.50 2.46500 97.0
#2 6 1 3 19.75 3.33750 107.5
#...
and if you want to use more than one function:
aggregate(cbind(cases=ncases, ncontrols) ~ cbind(alc=alcgp) + tobgp,
data = esoph, FUN = function(x) c("mean" = mean(x), "median" = median(x)))
# alc tobgp cases.mean cases.median ncontrols.mean ncontrols.median
#1 1 0-9g/day 1.5000000 1.0000000 43.500000 47.000000
#2 2 0-9g/day 5.6666667 4.0000000 29.833333 34.500000
#...
which adds to the colname the used aggregate-function.
Hera again cbind
replaces factors
by their internal codes. To avoid this you can use:
with(esoph, aggregate(data.frame(cases=ncases, ncontrols),
data.frame(alc=alcgp, tobgp),
FUN = function(x) c("mean" = mean(x), "median" = median(x))))
# alc tobgp cases.mean cases.median ncontrols.mean ncontrols.median
#1 0-39g/day 0-9g/day 1.5000000 1.0000000 43.500000 47.000000
#2 40-79 0-9g/day 5.6666667 4.0000000 29.833333 34.500000
#...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With