Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summary stats by factor level for multiple variables [duplicate]

Tags:

r

summary

I want to produce dataframes containing summary statistics for each factor level for multiple variables.

For example if I have the following dataframe

Factor <- c("A","A","A","B","B","B")
Variable1 <- c(3,4,5,4,5,3)
Variable2 <- c(7,9,14,16,10,10)
mydf <- data.frame(Factor, Variable1, Variable2)
mydf
  Factor Variable1 Variable2
1      A         3         7
2      A         4         9
3      A         5        14
4      B         4        16
5      B         5        10
6      B         3        10

and I have the following function that I want to use to produce my summary stats:

my.summary <- function(x, na.rm=TRUE){result <- c(n=as.integer(length(x)),
Mean=mean(x, na.rm=TRUE), SD=sd(x, na.rm=TRUE), SeM = SEM(x),
Median=median(x),   Min=min(x), Max=max(x))}

To apply this to factor levels of Variable1 I can do this:

ddply(mydf, c("Factor"), function(x) my.summary(x$Variable1))
  Factor n Mean SD       SeM Median Min Max
1      A 3    4  1 0.5773503      4   3   5
2      B 3    4  1 0.5773503      4   3   5

Now I can do the same for Variable 2:

ddply(mydf, c("Factor"), function(x) my.summary(x$Variable2))

Which is easy enough if I just have 2 variables. However, if I had lots of variables this would be a pain. So how can I solve this so that I can produce a dataframe of the summary stats for each variable/factor level without having to adjust the code?

I have tried using aggregate.data.frame but it doesn't work using my.summary. It works using summary but produces one big data frame.

Thanks

like image 982
Rory Shaw Avatar asked May 12 '26 18:05

Rory Shaw


1 Answers

You could use summarise_each from dplyr:

library(dplyr)

mydf %>% group_by(Factor) %>%
         summarise_each(funs(my.summary(.)))

After modifying your function to return a list:

my.summary <- function(x, na.rm=TRUE){result <- list(c(n=as.integer(length(x)),
                                                  Mean=mean(x, na.rm=TRUE), SD=sd(x, na.rm=TRUE),
                                                  Median=median(x),   Min=min(x), Max=max(x)))}
like image 85
jeremycg Avatar answered May 15 '26 09:05

jeremycg



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!