Given a data_frame df <- data_frame(X = c('A', 'A', 'B', 'B', 'B'), Y = c('M', 'N', 'M', 'M', 'N')), I need to come up with a data_frame that tells us that 50% of A's are M, 50% of A's are N, 67% of B's are M, and 33% of B's are N.
I have a little routine that I use to do it, but it seems horrible.
library(tidyverse)
df <- data_frame(X = c('A', 'A', 'B', 'B', 'B'), Y = c('M', 'N', 'M', 'M', 'N')) 
# here we go...
df %>% 
  group_by(X) %>% 
  mutate(n_X = n()) %>% 
  group_by(X, Y) %>% 
  summarise(PERCENT = n() / first(n_X))
which outputs,
Source: local data frame [4 x 3]
Groups: X [?]
      X     Y   PERCENT
  <chr> <chr>     <dbl>
1     A     M 0.5000000
2     A     N 0.5000000
3     B     M 0.6666667
4     B     N 0.3333333
Is there not a better way to do this? Surely I'm missing something.
You can use prop.table:
df %>% 
  group_by(X, Y) %>%
  count() %>%
  mutate(PERCENT = prop.table(n))
The result:
      X     Y     n   PERCENT
  <chr> <chr> <int>     <dbl>
1     A     M     1 0.5000000
2     A     N     1 0.5000000
3     B     M     2 0.6666667
4     B     N     1 0.3333333
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With