I have a data frame with two columns. First column contains categories such as "First", "Second", "Third", and the second column has numbers that represent the number of times I saw the specific groups from "Category".
For example:
Category Frequency First 10 First 15 First 5 Second 2 Third 14 Third 20 Second 3
I want to sort the data by Category and sum all the Frequencies:
Category Frequency First 30 Second 5 Third 34
How would I do this in R?
Now we can use the group_by and the summarise_at functions to get the summation by group: iris %>% # Specify data frame group_by(Species) %>% # Specify group indicator summarise_at(vars(Sepal. Length), # Specify column list(name = sum)) # Specify function # A tibble: 3 x 2 # Species name # <fct> <dbl> # 1 setosa 250.
The function n() returns the number of observations in a current group.
Using aggregate
:
aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum) Category x 1 First 30 2 Second 5 3 Third 34
In the example above, multiple dimensions can be specified in the list
. Multiple aggregated metrics of the same data type can be incorporated via cbind
:
aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...
(embedding @thelatemail comment), aggregate
has a formula interface too
aggregate(Frequency ~ Category, x, sum)
Or if you want to aggregate multiple columns, you could use the .
notation (works for one column too)
aggregate(. ~ Category, x, sum)
or tapply
:
tapply(x$Frequency, x$Category, FUN=sum) First Second Third 30 5 34
Using this data:
x <- data.frame(Category=factor(c("First", "First", "First", "Second", "Third", "Third", "Second")), Frequency=c(10,15,5,2,14,20,3))
You can also use the dplyr package for that purpose:
library(dplyr) x %>% group_by(Category) %>% summarise(Frequency = sum(Frequency)) #Source: local data frame [3 x 2] # # Category Frequency #1 First 30 #2 Second 5 #3 Third 34
Or, for multiple summary columns (works with one column too):
x %>% group_by(Category) %>% summarise(across(everything(), sum))
Here are some more examples of how to summarise data by group using dplyr functions using the built-in dataset mtcars
:
# several summary columns with arbitrary names mtcars %>% group_by(cyl, gear) %>% # multiple group columns summarise(max_hp = max(hp), mean_mpg = mean(mpg)) # multiple summary columns # summarise all columns except grouping columns using "sum" mtcars %>% group_by(cyl) %>% summarise(across(everything(), sum)) # summarise all columns except grouping columns using "sum" and "mean" mtcars %>% group_by(cyl) %>% summarise(across(everything(), list(mean = mean, sum = sum))) # multiple grouping columns mtcars %>% group_by(cyl, gear) %>% summarise(across(everything(), list(mean = mean, sum = sum))) # summarise specific variables, not all mtcars %>% group_by(cyl, gear) %>% summarise(across(c(qsec, mpg, wt), list(mean = mean, sum = sum))) # summarise specific variables (numeric columns except grouping columns) mtcars %>% group_by(gear) %>% summarise(across(where(is.numeric), list(mean = mean, sum = sum)))
For more information, including the %>%
operator, see the introduction to dplyr.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With