I have a data frame with two columns. First column contains categories such as "First", "Second", "Third", and the second column has numbers that represent the number of times I saw the specific groups from "Category". For example: <pre class="prettyprint"><code>Category Frequency First 10 First 15 First 5 Second 2 Third 14 Third 20 Second 3 </code></pre> I want to sort the data by Category and sum all the Frequencies: <pre class="prettyprint"><code>Category Frequency First 30 Second 5 Third 34 </code></pre> How would I do this in R?

Using <code>aggregate</code>: <pre class="prettyprint"><code>aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum) Category x 1 First 30 2 Second 5 3 Third 34 </code></pre> <hr> In the example above, multiple dimensions can be specified in the <code>list</code>. Multiple aggregated metrics of the same data type can be incorporated via <code>cbind</code>: <pre class="prettyprint"><code>aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ... </code></pre> <hr> (embedding @thelatemail comment), <code>aggregate</code> has a formula interface too <pre class="prettyprint"><code>aggregate(Frequency ~ Category, x, sum) </code></pre> Or if you want to aggregate multiple columns, you could use the <code>.</code> notation (works for one column too) <pre class="prettyprint"><code>aggregate(. ~ Category, x, sum) </code></pre> <hr> or <code>tapply</code>: <pre class="prettyprint"><code>tapply(x$Frequency, x$Category, FUN=sum) First Second Third 30 5 34 </code></pre> <hr> Using this data: <pre class="prettyprint"><code>x <- data.frame(Category=factor(c("First", "First", "First", "Second", "Third", "Third", "Second")), Frequency=c(10,15,5,2,14,20,3)) </code></pre>

How to sum a variable by group

Tags:

dataframe

r

r-faq

aggregate

I have a data frame with two columns. First column contains categories such as "First", "Second", "Third", and the second column has numbers that represent the number of times I saw the specific groups from "Category".

For example:

Category     Frequency First        10 First        15 First        5 Second       2 Third        14 Third        20 Second       3

I want to sort the data by Category and sum all the Frequencies:

Category     Frequency First        30 Second       5 Third        34

How would I do this in R?

263

asked Nov 02 '09 09:11

user5243421

2 Answers

Using aggregate:

aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)   Category  x 1    First 30 2   Second  5 3    Third 34

In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind:

aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...

(embedding @thelatemail comment), aggregate has a formula interface too

aggregate(Frequency ~ Category, x, sum)

Or if you want to aggregate multiple columns, you could use the . notation (works for one column too)

aggregate(. ~ Category, x, sum)

or tapply:

tapply(x$Frequency, x$Category, FUN=sum)  First Second  Third      30      5     34

Using this data:

x <- data.frame(Category=factor(c("First", "First", "First", "Second",                                       "Third", "Third", "Second")),                      Frequency=c(10,15,5,2,14,20,3))

148

answered Oct 02 '22 03:10

rcs

You can also use the dplyr package for that purpose:

library(dplyr) x %>%    group_by(Category) %>%    summarise(Frequency = sum(Frequency))  #Source: local data frame [3 x 2] # #  Category Frequency #1    First        30 #2   Second         5 #3    Third        34

Or, for multiple summary columns (works with one column too):

x %>%    group_by(Category) %>%    summarise(across(everything(), sum))

Here are some more examples of how to summarise data by group using dplyr functions using the built-in dataset mtcars:

# several summary columns with arbitrary names mtcars %>%    group_by(cyl, gear) %>%                            # multiple group columns   summarise(max_hp = max(hp), mean_mpg = mean(mpg))  # multiple summary columns  # summarise all columns except grouping columns using "sum"  mtcars %>%    group_by(cyl) %>%    summarise(across(everything(), sum))  # summarise all columns except grouping columns using "sum" and "mean" mtcars %>%    group_by(cyl) %>%    summarise(across(everything(), list(mean = mean, sum = sum)))  # multiple grouping columns mtcars %>%    group_by(cyl, gear) %>%    summarise(across(everything(), list(mean = mean, sum = sum)))  # summarise specific variables, not all mtcars %>%    group_by(cyl, gear) %>%    summarise(across(c(qsec, mpg, wt), list(mean = mean, sum = sum)))  # summarise specific variables (numeric columns except grouping columns) mtcars %>%    group_by(gear) %>%    summarise(across(where(is.numeric), list(mean = mean, sum = sum)))

For more information, including the %>% operator, see the introduction to dplyr.

answered Oct 02 '22 03:10

talat

Related questions
                            
                                How to drop columns by name in a data frame
                            
                                How does one reorder columns in a data frame?
                            
                                Is there an R function for finding the index of an element in a vector?
                            
                                Convert data.frame columns from factors to characters
                            
                                How to find the length of a string in R
                            
                                How can I trim leading and trailing white space?
                            
                                How to change legend title in ggplot
                            
                                How to set limits for axes in ggplot2 R plots?
                            
                                Sample random rows in dataframe
                            
                                Side-by-side plots with ggplot2
                            
                                How to rename a single column in a data.frame?
                            
                                Elegant way to check for missing packages and install them?
                            
                                Write lines of text to a file in R
                            
                                How to add leading zeros?
                            
                                Why is `[` better than `subset`?
                            
                                Extracting specific columns from a data frame
                            
                                Combine a list of data frames into one data frame by row
                            
                                How can two strings be concatenated?
                            
                                How do I install an R package from source?
                            
                                How to write trycatch in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With