How to calculate mean of all columns, by group?

Tags:

I need to get the mean of all columns of a large data set using R, grouped by 2 variables.

Lets try it with mtcars:

library(dplyr)
g_mtcars <- group_by(mtcars, cyl, gear)
summarise(g_mtcars, mean (hp))

# Source: local data frame [8 x 3]
# Groups: cyl [?]
# 
#     cyl  gear `mean(hp)`
#   <dbl> <dbl>      <dbl>
# 1     4     3    97.0000
# 2     4     4    76.0000
# 3     4     5   102.0000
# 4     6     3   107.5000
# 5     6     4   116.5000
# 6     6     5   175.0000
# 7     8     3   194.1667
# 8     8     5   299.5000

It works for "hp", but I need to get the mean for every other columns of mtcars (except "cyl" and "gear" that make a group). The data set is large, with several columns. Typing it by hand, like this: summarise(g_mtcars, mean (hp), mean(drat), mean (wt),...) is not practical.

612

asked Dec 03 '16 11:12

Miguel Rozsas

3 Answers

Edit2: Recent version of dplyr suggests using regular summarise with across function, as in:

library(dplyr)
mtcars %>% 
group_by(cyl, gear) %>%
summarise(across(everything(), mean))

What you're looking for is either ?summarise_all or ?summarise_each from dplyr

Edit: full code:

library(dplyr)
mtcars %>% 
    group_by(cyl, gear) %>%
    summarise_all("mean")

# Source: local data frame [8 x 11]
# Groups: cyl [?]
# 
#     cyl  gear    mpg     disp       hp     drat       wt    qsec    vs    am     carb
#   <dbl> <dbl>  <dbl>    <dbl>    <dbl>    <dbl>    <dbl>   <dbl> <dbl> <dbl>    <dbl>
# 1     4     3 21.500 120.1000  97.0000 3.700000 2.465000 20.0100   1.0  0.00 1.000000
# 2     4     4 26.925 102.6250  76.0000 4.110000 2.378125 19.6125   1.0  0.75 1.500000
# 3     4     5 28.200 107.7000 102.0000 4.100000 1.826500 16.8000   0.5  1.00 2.000000
# 4     6     3 19.750 241.5000 107.5000 2.920000 3.337500 19.8300   1.0  0.00 1.000000
# 5     6     4 19.750 163.8000 116.5000 3.910000 3.093750 17.6700   0.5  0.50 4.000000
# 6     6     5 19.700 145.0000 175.0000 3.620000 2.770000 15.5000   0.0  1.00 6.000000
# 7     8     3 15.050 357.6167 194.1667 3.120833 4.104083 17.1425   0.0  0.00 3.083333
# 8     8     5 15.400 326.0000 299.5000 3.880000 3.370000 14.5500   0.0  1.00 6.000000

answered Oct 05 '22 22:10

Wojciech Książek

aggregate is the easiest way to do this in base:

aggregate(. ~ cyl + gear, data = mtcars, FUN = mean)
#   cyl gear    mpg     disp       hp     drat       wt    qsec  vs   am     carb
# 1   4    3 21.500 120.1000  97.0000 3.700000 2.465000 20.0100 1.0 0.00 1.000000
# 2   6    3 19.750 241.5000 107.5000 2.920000 3.337500 19.8300 1.0 0.00 1.000000
# 3   8    3 15.050 357.6167 194.1667 3.120833 4.104083 17.1425 0.0 0.00 3.083333
# 4   4    4 26.925 102.6250  76.0000 4.110000 2.378125 19.6125 1.0 0.75 1.500000
# 5   6    4 19.750 163.8000 116.5000 3.910000 3.093750 17.6700 0.5 0.50 4.000000
# 6   4    5 28.200 107.7000 102.0000 4.100000 1.826500 16.8000 0.5 1.00 2.000000
# 7   6    5 19.700 145.0000 175.0000 3.620000 2.770000 15.5000 0.0 1.00 6.000000
# 8   8    5 15.400 326.0000 299.5000 3.880000 3.370000 14.5500 0.0 1.00 6.000000

answered Oct 05 '22 21:10

Gregor Thomas

using data.table.(however you can't setDT(mtcars) because binding is locked. copy it to a different name like dt and try

 library(data.table)
 mt_dt = as.data.table(mtcars)
 mt_dt[ , lapply(.SD, mean) , by=c("cyl", "gear")]

answered Oct 05 '22 23:10

joel.wilson

Related questions
                            
                                Create group names for consecutive values
                            
                                annotate boxplot in ggplot2
                            
                                Removing text containing non-english character
                            
                                Are there ways to randomly sample among ties in the R function which.max()?
                            
                                How to do an inverse log transformation in R?
                            
                                Is R a compiled language?
                            
                                How can I plot a 1-D plot in R?
                            
                                convert a vector to a list
                            
                                Bizzarre issue trying to make Rpy2 2.1.9 work with R 2.12.1, using Python 2.6 under Windows xp - Rpy can't find the R.dll?
                            
                                Rscript on ubuntu
                            
                                arrow() in ggplot2 no longer supported
                            
                                How to filter/subset a data.frame using values from one of its column [duplicate]
                            
                                Gram Schmidt with R
                            
                                How to prevent two labels to overlap in a barchart?
                            
                                Loop through data frame and variable names
                            
                                Customize background to highlight ranges of data in ggplot [duplicate]
                            
                                R solving hackerrank challenge
                            
                                trouble installing packages in CentOS: internet routines cannot be loaded
                            
                                Forest plot for a beginner simple example using ggplot2 [edited] [closed]
                            
                                Finding the number of values above a value and less than a value in a df column without using a loop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to calculate mean of all columns, by group?

Tags:

r

group-by

mean