Is there a way to add extra statistics to a summarize_at
call? For example
iris %>% group_by(Species) %>% summarise_at(vars(), funs(mean, sd))
will compute the means and standard deviations of 4 columns (giving a total of 8 columns). Suppose I also wanted to know how many rows were in each group. I.e., something like
# Below is not valid syntax
iris %>%
group_by(Species) %>%
summarise_at(vars(), funs(mean, sd)) + summarise(n())
Given that the above does not work a kludge is
iris %>% group_by(Species) %>% summarise_at(vars(), funs(mean, sd, length))
which produces, in effect, 4 copies of the count column.
Perhaps this is beyond what can be conveniently handled by summarize_at
and friends?
How about this:
iris %>%
group_by(Species) %>%
mutate(Count = n()) %>%
group_by(Species, Count) %>%
summarize_at(vars(), funs(mean, sd))
We can do this with data.table
in a more flexible way
library(data.table)
as.data.table(iris)[, c(n = .N, unlist(lapply(.SD, function(x)
list(Mean=mean(x), SD=sd(x))), recursive = FALSE)), .(Species)]
# Species n Sepal.Length.Mean Sepal.Length.SD Sepal.Width.Mean Sepal.Width.SD Petal.Length.Mean Petal.Length.SD Petal.Width.Mean
#1: setosa 50 5.006 0.3524897 3.428 0.3790644 1.462 0.1736640 0.246
#2: versicolor 50 5.936 0.5161711 2.770 0.3137983 4.260 0.4699110 1.326
#3: virginica 50 6.588 0.6358796 2.974 0.3224966 5.552 0.5518947 2.026
# Petal.Width.SD
#1: 0.1053856
#2: 0.1977527
#3: 0.2746501
Or using dplyr
, we may need to do a join
iris1 <- iris %>%
group_by(Species) %>%
summarise_all(funs(mean, sd))
iris %>%
group_by(Species) %>%
summarise(n = n()) %>%
full_join(iris1)
Or with bind_cols
iris %>%
group_by(Species) %>%
summarise_all(funs(mean, sd)) %>% bind_cols(., iris %>% count(Species) %>% select(-Species))
# A tibble: 3 × 10
# Species Sepal.Length_mean Sepal.Width_mean Petal.Length_mean Petal.Width_mean Sepal.Length_sd Sepal.Width_sd Petal.Length_sd Petal.Width_sd n
# <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#1 setosa 5.006 3.428 1.462 0.246 0.3524897 0.3790644 0.1736640 0.1053856 50
#2 versicolor 5.936 2.770 4.260 1.326 0.5161711 0.3137983 0.4699110 0.1977527 50
#3 virginica 6.588 2.974 5.552 2.026 0.6358796 0.3224966 0.5518947 0.2746501 50
To specify on which column to apply the statistics:
iris %>% group_by(Species) %>%
mutate(Count = n()) %>%
group_by(Species, Count) %>%
summarize_at(vars(Sepal.Length)), funs(mean, sd)) -> dt_stat
dt_stat
or to apply on all columns starting with "Sepal"
:
iris %>% group_by(Species) %>%
mutate(Count = n()) %>%
group_by(Species, Count) %>%
summarize_at(vars(starts_with("Sepal")), funs(mean, sd)) -> dt_stat2
dt_stat2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With