I'm trying to get multiple summary statistics in R/S-PLUS grouped by categorical column in one shot. I found couple of functions, but all of them do one statistic per call, like aggregate()
.
data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66, 71, 67, 68, 68, 56, 62, 60, 61, 63, 64, 63, 59) grp <- factor(rep(LETTERS[1:4], c(4,6,6,8))) df <- data.frame(group=grp, dt=data) mg <- aggregate(df$dt, by=df$group, FUN=mean) mg <- aggregate(df$dt, by=df$group, FUN=sum)
What I'm looking for is to get multiple statistics for the same group like mean, min, max, std, ...etc in one call, is that doable?
Summary Statistics: Measures of Spread For example, test scores that are in the 60-90 range might be expected while scores in the 20-70 range might indicate a problem. Range isn't the only measure of spread though.
Select (click) the summary statistics source variable on the canvas pane of the Table tab. In the Define group of the Table tab, click Summary Statistics. Right-click the summary statistics source variable on the canvas pane and select Summary Statistics from the pop-up menu.
tapply
I'll put in my two cents for tapply()
.
tapply(df$dt, df$group, summary)
You could write a custom function with the specific statistics you want or format the results:
tapply(df$dt, df$group, function(x) format(summary(x), scientific = TRUE)) $A Min. 1st Qu. Median Mean 3rd Qu. Max. "5.900e+01" "5.975e+01" "6.100e+01" "6.100e+01" "6.225e+01" "6.300e+01" $B Min. 1st Qu. Median Mean 3rd Qu. Max. "6.300e+01" "6.425e+01" "6.550e+01" "6.600e+01" "6.675e+01" "7.100e+01" $C Min. 1st Qu. Median Mean 3rd Qu. Max. "6.600e+01" "6.725e+01" "6.800e+01" "6.800e+01" "6.800e+01" "7.100e+01" $D Min. 1st Qu. Median Mean 3rd Qu. Max. "5.600e+01" "5.975e+01" "6.150e+01" "6.100e+01" "6.300e+01" "6.400e+01"
data.table
The data.table
package offers a lot of helpful and fast tools for these types of operation:
library(data.table) setDT(df) > df[, as.list(summary(dt)), by = group] group Min. 1st Qu. Median Mean 3rd Qu. Max. 1: A 59 59.75 61.0 61 62.25 63 2: B 63 64.25 65.5 66 66.75 71 3: C 66 67.25 68.0 68 68.00 71 4: D 56 59.75 61.5 61 63.00 64
dplyr package could be nice alternative to this problem:
library(dplyr) df %>% group_by(group) %>% summarize(mean = mean(dt), sum = sum(dt))
To get 1st quadrant and 3rd quadrant
df %>% group_by(group) %>% summarize(q1 = quantile(dt, 0.25), q3 = quantile(dt, 0.75))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With