<p>How to create simple summary statistics using <code>dplyr</code> from multiple variables? Using the <code>summarise_each</code> function seems to be the way to go, however, when applying multiple functions to multiple columns, the result is a wide, hard-to-read data frame.</p>

<p>Use <code>dplyr</code> in combination with <code>tidyr</code> to reshape the end result.</p> <pre class="prettyprint"><code>library(dplyr) library(tidyr) df <- tbl_df(mtcars) df.sum <- df %>% select(mpg, cyl, vs, am, gear, carb) %>% # select variables to summarise summarise_each(funs(min = min, q25 = quantile(., 0.25), median = median, q75 = quantile(., 0.75), max = max, mean = mean, sd = sd)) # the result is a wide data frame > dim(df.sum) [1] 1 42 # reshape it using tidyr functions df.stats.tidy <- df.sum %>% gather(stat, val) %>% separate(stat, into = c("var", "stat"), sep = "_") %>% spread(stat, val) %>% select(var, min, q25, median, q75, max, mean, sd) # reorder columns > print(df.stats.tidy) var min q25 median q75 max mean sd 1 am 0.0 0.000 0.0 1.0 1.0 0.40625 0.4989909 2 carb 1.0 2.000 2.0 4.0 8.0 2.81250 1.6152000 3 cyl 4.0 4.000 6.0 8.0 8.0 6.18750 1.7859216 4 gear 3.0 3.000 4.0 4.0 5.0 3.68750 0.7378041 5 mpg 10.4 15.425 19.2 22.8 33.9 20.09062 6.0269481 6 vs 0.0 0.000 0.0 1.0 1.0 0.43750 0.5040161 </code></pre>

dplyr - summary table for multiple variables

Tags:

r

dplyr

How to create simple summary statistics using dplyr from multiple variables? Using the summarise_each function seems to be the way to go, however, when applying multiple functions to multiple columns, the result is a wide, hard-to-read data frame.

312

asked Jan 04 '16 15:01

paljenczy

1 Answers

Use dplyr in combination with tidyr to reshape the end result.

library(dplyr)
library(tidyr)

df <- tbl_df(mtcars)

df.sum <- df %>%
  select(mpg, cyl, vs, am, gear, carb) %>% # select variables to summarise
  summarise_each(funs(min = min, 
                      q25 = quantile(., 0.25), 
                      median = median, 
                      q75 = quantile(., 0.75), 
                      max = max,
                      mean = mean, 
                      sd = sd))

# the result is a wide data frame
> dim(df.sum)
[1]  1 42

# reshape it using tidyr functions

df.stats.tidy <- df.sum %>% gather(stat, val) %>%
  separate(stat, into = c("var", "stat"), sep = "_") %>%
  spread(stat, val) %>%
  select(var, min, q25, median, q75, max, mean, sd) # reorder columns

> print(df.stats.tidy)

   var  min    q25 median  q75  max     mean        sd
1   am  0.0  0.000    0.0  1.0  1.0  0.40625 0.4989909
2 carb  1.0  2.000    2.0  4.0  8.0  2.81250 1.6152000
3  cyl  4.0  4.000    6.0  8.0  8.0  6.18750 1.7859216
4 gear  3.0  3.000    4.0  4.0  5.0  3.68750 0.7378041
5  mpg 10.4 15.425   19.2 22.8 33.9 20.09062 6.0269481
6   vs  0.0  0.000    0.0  1.0  1.0  0.43750 0.5040161

146

answered Oct 07 '22 01:10

paljenczy

Related questions
                            
                                Copy folder recursive in R
                            
                                Get a single value out of any statistics tests (e.g. value of spearman rho out of cor.test)
                            
                                Plot fitted line within certain range R
                            
                                apply strsplit rowwise
                            
                                Filter data frame rows based on values in vector
                            
                                Convert comma separated string to numeric columns
                            
                                R - Rank Largest to Smallest
                            
                                Is there a way to view a list
                            
                                How to find difference between values in two rows in an R dataframe using dplyr
                            
                                How to use Rcpp to speed up a for loop?
                            
                                Rename one named column in R
                            
                                How to change the first row to be the header in R?
                            
                                Creating a Unique Sequence of Dates
                            
                                How to use Outlier Tests in R Code
                            
                                tm_map has parallel::mclapply error in R 3.0.1 on Mac
                            
                                Get the right hand side variables of an R formula
                            
                                Find the indices of last occurrence of the unique elements in a vector
                            
                                Faster version of combn
                            
                                Idiom for ifelse-style recoding for multiple categories
                            
                                regex - return all before the second occurrence

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

dplyr - summary table for multiple variables

Tags:

r

dplyr

paljenczy

People also ask

1 Answers

paljenczy

Recent Activity

Donate For Us