How to get summary statistics by group




I'm trying to get multiple summary statistics in R/S-PLUS grouped by categorical column in one shot. I found couple of functions, but all of them do one statistic per call, like aggregate().

data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66,            71, 67, 68, 68, 56, 62, 60, 61, 63, 64, 63, 59) grp <- factor(rep(LETTERS[1:4], c(4,6,6,8))) df <- data.frame(group=grp, dt=data) mg <- aggregate(df$dt, by=df$group, FUN=mean)     mg <- aggregate(df$dt, by=df$group, FUN=sum)     

What I'm looking for is to get multiple statistics for the same group like mean, min, max, std, ...etc in one call, is that doable?

2 Answers

1. tapply

I'll put in my two cents for tapply().

tapply(df$dt, df$group, summary) 

You could write a custom function with the specific statistics you want or format the results:

tapply(df$dt, df$group,   function(x) format(summary(x), scientific = TRUE)) $A        Min.     1st Qu.      Median        Mean     3rd Qu.        Max.  "5.900e+01" "5.975e+01" "6.100e+01" "6.100e+01" "6.225e+01" "6.300e+01"   $B        Min.     1st Qu.      Median        Mean     3rd Qu.        Max.  "6.300e+01" "6.425e+01" "6.550e+01" "6.600e+01" "6.675e+01" "7.100e+01"   $C        Min.     1st Qu.      Median        Mean     3rd Qu.        Max.  "6.600e+01" "6.725e+01" "6.800e+01" "6.800e+01" "6.800e+01" "7.100e+01"   $D        Min.     1st Qu.      Median        Mean     3rd Qu.        Max.  "5.600e+01" "5.975e+01" "6.150e+01" "6.100e+01" "6.300e+01" "6.400e+01" 

2. data.table

The data.table package offers a lot of helpful and fast tools for these types of operation:

library(data.table) setDT(df) > df[, as.list(summary(dt)), by = group]    group Min. 1st Qu. Median Mean 3rd Qu. Max. 1:     A   59   59.75   61.0   61   62.25   63 2:     B   63   64.25   65.5   66   66.75   71 3:     C   66   67.25   68.0   68   68.00   71 4:     D   56   59.75   61.5   61   63.00   64 
dplyr package could be nice alternative to this problem:

library(dplyr)  df %>%    group_by(group) %>%    summarize(mean = mean(dt),             sum = sum(dt)) 

To get 1st quadrant and 3rd quadrant

df %>%    group_by(group) %>%    summarize(q1 = quantile(dt, 0.25),             q3 = quantile(dt, 0.75)) 
