Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr summarise_each() using multiple functions for different column subsets across the same groups

Tags:

r

dplyr

I'd like to use summarise_each() to apply multiple functions to a grouped dataset. However, rather than apply each function to all columns, I'd like to apply each function to particular subsets. I realize I could do this by specifying each column with summarise(), but I have many variables.

Is there an alternate solution to either 1) using summarise_each() and then deleting the unneeded columns or 2) saving the group_by() result, performing multiple separate summarise_each() operations and combining the results?

If this is not clear, let me know and I can try to illustrate with some example code.

like image 831
Cotton.Rockwood Avatar asked Jan 16 '16 00:01

Cotton.Rockwood


People also ask

How to perform computation across multiple columns in dplyr?

The dplyr package [v>= 1.0.0] is required. We’ll use the function across () to make computation across multiple columns. .cols: Columns you want to operate on. You can pick columns by position, name, function of name, type, or any combination thereof using Boolean operators. .fns: Function or list of functions to apply to each column. ...:

What happens when a variable is named in dplyr?

If a variable in .vars is named, a new column by that name will be created. Name collisions in the new columns are disambiguated using a unique suffix. The functions are maturing, because the naming scheme and the disambiguation algorithm are subject to change in dplyr 0.9.0.

Is it possible to use multiple functions in summarise_* function?

It can use every feature of summarize at like applying several functions to several columns. You can do the same with all summarise_* functions. Is this the kind of result you seek ?

How to use across () in dplyr verbs?

Because across () is used within functions like summarise () and mutate (), you can't select or compute upon grouping variables. across () returns a tibble with one column for each column in .cols and each function in .fns. if_any () and if_all () return a logical vector. R code in dplyr verbs is generally evaluated once per group.


Video Answer


1 Answers

I would suggest the following: here I would like to apply min function to one variable and max function to other. Then I simply merge those with the grouping variable.

> by_species <- iris %>% group_by(Species)    

Start with variable for which I want to apply the min function:

min_var <- by_species %>% summarise_each(funs(min), Petal.Width) min_var Source: local data frame [3 x 2]

      Species Petal.Width
       (fctr)       (dbl)
1     setosa         0.1
2 versicolor         1.0
3  virginica         1.4

Then the variable for which I want to apply the max function:

max_var <- by_species %>% summarise_each(funs(max), Sepal.Width) max_var Source: local data frame [3 x 2]

     Species Sepal.Width
      (fctr)       (dbl)
 1     setosa         4.4
 2 versicolor         3.4
 3  virginica         3.8

Now, we just merge the above two:

left_join(min_var,max_var) Joining by: "Species" Source: local data frame [3 x 3]

      Species Petal.Width Sepal.Width
     (fctr)       (dbl)       (dbl)
1     setosa         0.1         4.4
2 versicolor         1.0         3.4
3  virginica         1.4         3.8
like image 137
Rushad Faridi Avatar answered Oct 24 '22 11:10

Rushad Faridi