Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get summary statistics by group

Tags:

r

s

I'm trying to get multiple summary statistics in R/S-PLUS grouped by categorical column in one shot. I found couple of functions, but all of them do one statistic per call, like aggregate().

data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66,            71, 67, 68, 68, 56, 62, 60, 61, 63, 64, 63, 59) grp <- factor(rep(LETTERS[1:4], c(4,6,6,8))) df <- data.frame(group=grp, dt=data) mg <- aggregate(df$dt, by=df$group, FUN=mean)     mg <- aggregate(df$dt, by=df$group, FUN=sum)     

What I'm looking for is to get multiple statistics for the same group like mean, min, max, std, ...etc in one call, is that doable?

like image 375
user1289220 Avatar asked Mar 23 '12 22:03

user1289220


People also ask

What is an example of summary statistics?

Summary Statistics: Measures of Spread For example, test scores that are in the 60-90 range might be expected while scores in the 20-70 range might indicate a problem. Range isn't the only measure of spread though.

How do you find summary statistics in SPSS?

Select (click) the summary statistics source variable on the canvas pane of the Table tab. In the Define group of the Table tab, click Summary Statistics. Right-click the summary statistics source variable on the canvas pane and select Summary Statistics from the pop-up menu.


2 Answers

1. tapply

I'll put in my two cents for tapply().

tapply(df$dt, df$group, summary) 

You could write a custom function with the specific statistics you want or format the results:

tapply(df$dt, df$group,   function(x) format(summary(x), scientific = TRUE)) $A        Min.     1st Qu.      Median        Mean     3rd Qu.        Max.  "5.900e+01" "5.975e+01" "6.100e+01" "6.100e+01" "6.225e+01" "6.300e+01"   $B        Min.     1st Qu.      Median        Mean     3rd Qu.        Max.  "6.300e+01" "6.425e+01" "6.550e+01" "6.600e+01" "6.675e+01" "7.100e+01"   $C        Min.     1st Qu.      Median        Mean     3rd Qu.        Max.  "6.600e+01" "6.725e+01" "6.800e+01" "6.800e+01" "6.800e+01" "7.100e+01"   $D        Min.     1st Qu.      Median        Mean     3rd Qu.        Max.  "5.600e+01" "5.975e+01" "6.150e+01" "6.100e+01" "6.300e+01" "6.400e+01" 

2. data.table

The data.table package offers a lot of helpful and fast tools for these types of operation:

library(data.table) setDT(df) > df[, as.list(summary(dt)), by = group]    group Min. 1st Qu. Median Mean 3rd Qu. Max. 1:     A   59   59.75   61.0   61   62.25   63 2:     B   63   64.25   65.5   66   66.75   71 3:     C   66   67.25   68.0   68   68.00   71 4:     D   56   59.75   61.5   61   63.00   64 
like image 108
BenBarnes Avatar answered Oct 13 '22 05:10

BenBarnes


dplyr package could be nice alternative to this problem:

library(dplyr)  df %>%    group_by(group) %>%    summarize(mean = mean(dt),             sum = sum(dt)) 

To get 1st quadrant and 3rd quadrant

df %>%    group_by(group) %>%    summarize(q1 = quantile(dt, 0.25),             q3 = quantile(dt, 0.75)) 
like image 43
Jot eN Avatar answered Oct 13 '22 05:10

Jot eN