Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr number of rows across groups after filtering

Tags:

r

dplyr

tidyverse

I want the count and proportion (of all of elements) of each group in a data frame (after filtering). This code produces the desired output:

library(dplyr)
df <- data_frame(id = sample(letters[1:3], 100, replace = TRUE),
                 value = rnorm(100))

summary <- filter(df, value > 0) %>%
    group_by(id) %>%
    summarize(count = n()) %>%
    ungroup() %>%
    mutate(proportion = count / sum(count))

> summary
# A tibble: 3 x 3
     id count proportion
  <chr> <int>      <dbl>
1     a    17  0.3695652
2     b    13  0.2826087
3     c    16  0.3478261

Is there an elegant solution to avoid the ungroup() and second summarize() steps. Something like:

summary <- filter(df, value > 0) %>%
    group_by(id) %>%
    summarize(count = n(),
              proportion = n() / [?TOTAL_ROWS()?])

I couldn't find such a function in the documentation, but I must be missing something obvious. Thanks!

like image 373
Fridolin Linder Avatar asked Nov 27 '17 15:11

Fridolin Linder


1 Answers

You can use nrow on . which refers to the entire data frame piped in:

df %>% 
    filter(value > 0) %>% 
    group_by(id) %>% 
    summarise(count = n(), proportion = count / nrow(.))

# A tibble: 3 x 3
#     id count proportion
#  <chr> <int>      <dbl>
#1     a    14  0.2592593
#2     b    22  0.4074074
#3     c    18  0.3333333
like image 90
Psidom Avatar answered Nov 04 '22 12:11

Psidom