I'd like to use summarise_each()
to apply multiple functions to a grouped dataset. However, rather than apply each function to all columns, I'd like to apply each function to particular subsets. I realize I could do this by specifying each column with summarise()
, but I have many variables.
Is there an alternate solution to either 1) using summarise_each()
and then deleting the unneeded columns or 2) saving the group_by()
result, performing multiple separate summarise_each()
operations and combining the results?
If this is not clear, let me know and I can try to illustrate with some example code.
The dplyr package [v>= 1.0.0] is required. We’ll use the function across () to make computation across multiple columns. .cols: Columns you want to operate on. You can pick columns by position, name, function of name, type, or any combination thereof using Boolean operators. .fns: Function or list of functions to apply to each column. ...:
If a variable in .vars is named, a new column by that name will be created. Name collisions in the new columns are disambiguated using a unique suffix. The functions are maturing, because the naming scheme and the disambiguation algorithm are subject to change in dplyr 0.9.0.
It can use every feature of summarize at like applying several functions to several columns. You can do the same with all summarise_* functions. Is this the kind of result you seek ?
Because across () is used within functions like summarise () and mutate (), you can't select or compute upon grouping variables. across () returns a tibble with one column for each column in .cols and each function in .fns. if_any () and if_all () return a logical vector. R code in dplyr verbs is generally evaluated once per group.
I would suggest the following: here I would like to apply min function to one variable and max function to other. Then I simply merge those with the grouping variable.
> by_species <- iris %>% group_by(Species)
Start with variable for which I want to apply the min function:
min_var <- by_species %>% summarise_each(funs(min), Petal.Width) min_var Source: local data frame [3 x 2]
Species Petal.Width
(fctr) (dbl)
1 setosa 0.1
2 versicolor 1.0
3 virginica 1.4
Then the variable for which I want to apply the max function:
max_var <- by_species %>% summarise_each(funs(max), Sepal.Width) max_var Source: local data frame [3 x 2]
Species Sepal.Width
(fctr) (dbl)
1 setosa 4.4
2 versicolor 3.4
3 virginica 3.8
Now, we just merge the above two:
left_join(min_var,max_var) Joining by: "Species" Source: local data frame [3 x 3]
Species Petal.Width Sepal.Width
(fctr) (dbl) (dbl)
1 setosa 0.1 4.4
2 versicolor 1.0 3.4
3 virginica 1.4 3.8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With