Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using dplyr's do() with summary()

Tags:

r

dplyr

summary

I would like to be able to use dplyr's split-apply-combine strategy to the apply the summary() command.

Take a simple data frame:

df <- data.frame(class = c('A', 'A', 'B', 'B'),
                 value = c(100, 120, 800, 880))

Ideally we would do something like this:

df %>%
  group_by(class) %>%
  do(summary(.$value))

Unfortunately this does not work. Any ideas?

like image 803
Bastiaan Quast Avatar asked Mar 28 '16 12:03

Bastiaan Quast


People also ask

What can you do with dplyr?

Using dplyr to group, manipulate and summarize data. Working with large and complex sets of data is a day-to-day reality in applied statistics. The package dplyr provides a well structured set of functions for manipulating such data collections and performing typical operations with standard syntax that makes them easier to remember.

How to create simple summary statistics using dplyr from multiple variables?

How to create simple summary statistics using dplyr from multiple variables? Using the summarise_each function seems to be the way to go, however, when applying multiple functions to multiple columns, the result is a wide, hard-to-read data frame. Use dplyr in combination with tidyr to reshape the end result.

How do you summarize data in Python dplyr?

Basic dplyr Summarize We can use the basic summarize method by passing the data as the first parameter and the named parameter with a summary method. For example, below we pass the mean parameter to create a new column and we pass the mean () function call on the column we would like to summarize. This would add the mean of disp.

How to perform computation across multiple columns in dplyr?

The dplyr package [v>= 1.0.0] is required. We’ll use the function across () to make computation across multiple columns. .cols: Columns you want to operate on. You can pick columns by position, name, function of name, type, or any combination thereof using Boolean operators. .fns: Function or list of functions to apply to each column. ...:


1 Answers

You can use the SE version of data_frame, that is, data_frame_ and perform:

df %>%
  group_by(class) %>%
  do(data_frame_(summary(.$value)))

Alternatively, you can use as.list() wrapped by data.frame() with the argument check.names = FALSE:

df %>%
  group_by(class) %>%
  do(data.frame(as.list(summary(.$value)), check.names = FALSE))

Both versions produce:

# Source: local data frame [2 x 7]
# Groups: class [2]
# 
#    class  Min. 1st Qu. Median  Mean 3rd Qu.  Max.
#   (fctr) (dbl)   (dbl)  (dbl) (dbl)   (dbl) (dbl)
# 1      A   100     105    110   110     115   120
# 2      B   800     820    840   840     860   880
like image 85
JasonAizkalns Avatar answered Nov 15 '22 07:11

JasonAizkalns