I'm trying to get multiple summary statistics in R/S-PLUS grouped by categorical column in one shot. I found couple of functions, but all of them do one statistic per call, like <code>aggregate()</code>. <pre class="prettyprint"><code>data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66, 71, 67, 68, 68, 56, 62, 60, 61, 63, 64, 63, 59) grp <- factor(rep(LETTERS[1:4], c(4,6,6,8))) df <- data.frame(group=grp, dt=data) mg <- aggregate(df$dt, by=df$group, FUN=mean) mg <- aggregate(df$dt, by=df$group, FUN=sum) </code></pre> What I'm looking for is to get multiple statistics for the same group like mean, min, max, std, ...etc in one call, is that doable?

dplyr package could be nice alternative to this problem: <pre class="prettyprint"><code>library(dplyr) df %>% group_by(group) %>% summarize(mean = mean(dt), sum = sum(dt)) </code></pre> To get 1st quadrant and 3rd quadrant <pre class="prettyprint"><code>df %>% group_by(group) %>% summarize(q1 = quantile(dt, 0.25), q3 = quantile(dt, 0.75)) </code></pre>

How to get summary statistics by group

Tags:

r

s

I'm trying to get multiple summary statistics in R/S-PLUS grouped by categorical column in one shot. I found couple of functions, but all of them do one statistic per call, like aggregate().

data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66,            71, 67, 68, 68, 56, 62, 60, 61, 63, 64, 63, 59) grp <- factor(rep(LETTERS[1:4], c(4,6,6,8))) df <- data.frame(group=grp, dt=data) mg <- aggregate(df$dt, by=df$group, FUN=mean)     mg <- aggregate(df$dt, by=df$group, FUN=sum)

What I'm looking for is to get multiple statistics for the same group like mean, min, max, std, ...etc in one call, is that doable?

375

asked Mar 23 '12 22:03

user1289220

2 Answers

1. `tapply`

I'll put in my two cents for tapply().

tapply(df$dt, df$group, summary)

You could write a custom function with the specific statistics you want or format the results:

tapply(df$dt, df$group,   function(x) format(summary(x), scientific = TRUE)) $A        Min.     1st Qu.      Median        Mean     3rd Qu.        Max.  "5.900e+01" "5.975e+01" "6.100e+01" "6.100e+01" "6.225e+01" "6.300e+01"   $B        Min.     1st Qu.      Median        Mean     3rd Qu.        Max.  "6.300e+01" "6.425e+01" "6.550e+01" "6.600e+01" "6.675e+01" "7.100e+01"   $C        Min.     1st Qu.      Median        Mean     3rd Qu.        Max.  "6.600e+01" "6.725e+01" "6.800e+01" "6.800e+01" "6.800e+01" "7.100e+01"   $D        Min.     1st Qu.      Median        Mean     3rd Qu.        Max.  "5.600e+01" "5.975e+01" "6.150e+01" "6.100e+01" "6.300e+01" "6.400e+01"

2. `data.table`

The data.table package offers a lot of helpful and fast tools for these types of operation:

library(data.table) setDT(df) > df[, as.list(summary(dt)), by = group]    group Min. 1st Qu. Median Mean 3rd Qu. Max. 1:     A   59   59.75   61.0   61   62.25   63 2:     B   63   64.25   65.5   66   66.75   71 3:     C   66   67.25   68.0   68   68.00   71 4:     D   56   59.75   61.5   61   63.00   64

108

answered Oct 13 '22 05:10

BenBarnes

dplyr package could be nice alternative to this problem:

library(dplyr)  df %>%    group_by(group) %>%    summarize(mean = mean(dt),             sum = sum(dt))

To get 1st quadrant and 3rd quadrant

df %>%    group_by(group) %>%    summarize(q1 = quantile(dt, 0.25),             q3 = quantile(dt, 0.75))

answered Oct 13 '22 05:10

Jot eN

Related questions
                            
                                ggplot combining two plots from different data.frames
                            
                                Return index of the smallest value in a vector?
                            
                                Create a data.frame where a column is a list
                            
                                Formula with dynamic number of variables
                            
                                How can I interrupt a running code in R with a keyboard command?
                            
                                Trimming a huge (3.5 GB) csv file to read into R
                            
                                R sequence of dates with lubridate
                            
                                Saving a high resolution image in R
                            
                                Removing NA in dplyr pipe [duplicate]
                            
                                How to parse milliseconds?
                            
                                Is there a built-in way to do a logarithmic color scale in ggplot2?
                            
                                Creating a Prompt/Answer system to input data into R
                            
                                R Apply() function on specific dataframe columns
                            
                                Select random element in a list of R?
                            
                                Select rows from a data frame based on values in a vector
                            
                                Auto-format R code in RStudio
                            
                                What are the differences between community detection algorithms in igraph?
                            
                                How to use the switch statement in R functions?
                            
                                Find duplicated elements with dplyr
                            
                                How to convert a matrix to a list of column-vectors in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get summary statistics by group

Tags:

r

s

user1289220

People also ask

2 Answers

1. `tapply`

2. `data.table`

BenBarnes

Jot eN

Recent Activity

Donate For Us

How to get summary statistics by group

Tags:

r

s

user1289220

People also ask

2 Answers

1. tapply

2. data.table

BenBarnes

Jot eN

Related questions

Recent Activity

Donate For Us

1. `tapply`

2. `data.table`