I would like to be able to use dplyr
's split-apply-combine strategy to the apply the summary()
command.
Take a simple data frame:
df <- data.frame(class = c('A', 'A', 'B', 'B'),
value = c(100, 120, 800, 880))
Ideally we would do something like this:
df %>%
group_by(class) %>%
do(summary(.$value))
Unfortunately this does not work. Any ideas?
Using dplyr to group, manipulate and summarize data. Working with large and complex sets of data is a day-to-day reality in applied statistics. The package dplyr provides a well structured set of functions for manipulating such data collections and performing typical operations with standard syntax that makes them easier to remember.
How to create simple summary statistics using dplyr from multiple variables? Using the summarise_each function seems to be the way to go, however, when applying multiple functions to multiple columns, the result is a wide, hard-to-read data frame. Use dplyr in combination with tidyr to reshape the end result.
Basic dplyr Summarize We can use the basic summarize method by passing the data as the first parameter and the named parameter with a summary method. For example, below we pass the mean parameter to create a new column and we pass the mean () function call on the column we would like to summarize. This would add the mean of disp.
The dplyr package [v>= 1.0.0] is required. We’ll use the function across () to make computation across multiple columns. .cols: Columns you want to operate on. You can pick columns by position, name, function of name, type, or any combination thereof using Boolean operators. .fns: Function or list of functions to apply to each column. ...:
You can use the SE version of data_frame
, that is, data_frame_
and perform:
df %>%
group_by(class) %>%
do(data_frame_(summary(.$value)))
Alternatively, you can use as.list()
wrapped by data.frame()
with the argument check.names = FALSE
:
df %>%
group_by(class) %>%
do(data.frame(as.list(summary(.$value)), check.names = FALSE))
Both versions produce:
# Source: local data frame [2 x 7]
# Groups: class [2]
#
# class Min. 1st Qu. Median Mean 3rd Qu. Max.
# (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
# 1 A 100 105 110 110 115 120
# 2 B 800 820 840 840 860 880
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With