Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

summarise_at using different functions for different variables

Tags:

r

dplyr

tidyverse

When I use group_by and summarise in dplyr, I can naturally apply different summary functions to different variables. For instance:

    library(tidyverse)

    df <- tribble(
      ~category,   ~x,  ~y,  ~z,
      #----------------------
          'a',      4,   6,   8,
          'a',      7,   3,   0,
          'a',      7,   9,   0,
          'b',      2,   8,   8,
          'b',      5,   1,   8,
          'b',      8,   0,   1,
          'c',      2,   1,   1,
          'c',      3,   8,   0,
          'c',      1,   9,   1
     )

    df %>% group_by(category) %>% summarize(
      x=mean(x),
      y=median(y),
      z=first(z)
    )

results in output:

    # A tibble: 3 x 4
      category     x     y     z
         <chr> <dbl> <dbl> <dbl>
    1        a     6     6     8
    2        b     5     1     8
    3        c     2     8     1

My question is, how would I do this with summarise_at? Obviously for this example it's unnecessary, but assume I have lots of variables that I want to take the mean of, lots of medians, etc.

Do I lose this functionality once I move to summarise_at? Do I have to use all functions on all groups of variables and then throw away the ones I don't want?

Perhaps I'm just missing something, but I can't figure it out, and I don't see any examples of this in the documentation. Any help is appreciated.

like image 678
David Pepper Avatar asked Sep 13 '17 03:09

David Pepper


1 Answers

Since your question is about "summarise_at";

Here is what my idea is:

df %>% group_by(category) %>% 
 summarise_at(vars(x, y, z),
      funs(mean = mean, sd = sd, min = min),
      na.rm = TRUE)
like image 101
dido Avatar answered Sep 19 '22 12:09

dido