Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summarise? Count occurences in column based on another column

Tags:

r

dplyr

summarize

I believe this may have a simple solution but I'm having trouble describing what I need to do (and hence what to search for). I think I need the summarize function. My goal output is at the very bottom.

I'm trying to count the occurrences of a value between each unique value in another column. Here is an example df that hopefully illustrates what I need todo.

library(dplyr)

set.seed(1)
df <- tibble("name" = c(rep("dinah",2),rep("lucy",4),rep("sora",9)),
             "meal" = c(rep(c("chicken","beef","fish"),5)),
             "date" = seq(as.Date("1999/1/1"),as.Date("2000/1/1"),25),
             "num.wins" = sample(0:30)[1:15])

Among other things, I'm trying to summarize (sum) the types of meals each name had using this data.

df
# A tibble: 15 x 4
   name  meal    date       num.wins
   <chr> <chr>   <date>        <int>
 1 dinah chicken 1999-01-01        8
 2 dinah beef    1999-01-26       11
 3 lucy  fish    1999-02-20       16
 4 lucy  chicken 1999-03-17       25
 5 lucy  beef    1999-04-11        5
 6 lucy  fish    1999-05-06       23
 7 sora  chicken 1999-05-31       27
 8 sora  beef    1999-06-25       15
 9 sora  fish    1999-07-20       14
10 sora  chicken 1999-08-14        1
11 sora  beef    1999-09-08        4
12 sora  fish    1999-10-03        3
13 sora  chicken 1999-10-28       13
14 sora  beef    1999-11-22        6
15 sora  fish    1999-12-17       18

I've made progress with other calculations I'm interested in, below:

df %>% 
  group_by(name) %>% 
  summarise(count=n(),
            medianDate=median(date),
            life=(max(date)-min(date)),
            wins=sum(num.wins))

# A tibble: 3 x 5
  name  count medianDate life      wins
  <chr> <int> <date>     <time>   <int>
1 dinah     2 1999-01-13  25 days    19
2 lucy      4 1999-03-29  75 days    69
3 sora      9 1999-09-08 200 days   101

My goal is to add an additional column for each type of food, and have the sum of the occurrences of that food displayed in each row, like so:

  name  count medianDate life      wins  chicken  beef  fish
1 dinah     2 1999-01-13  25 days    19        1     1     0
2 lucy      4 1999-03-29  75 days    69        1     1     2
3 sora      9 1999-09-08 200 days   101        3     3     3
like image 561
tastycanofmalk Avatar asked Dec 06 '22 10:12

tastycanofmalk


2 Answers

Though older, and possibly on a deprecation path, reshape2::dcast does this nicely:

reshape2::dcast(df, name ~ meal)
#    name beef chicken fish
# 1 dinah    1       1    0
# 2  lucy    1       1    2
# 3  sora    3       3    3

You can understand the formula as rows ~ columns. By default, it will aggregate the values in the columns using the length function---which gives exactly what you want, the count of each.

This can be easily joined to your summary data:

df %>% 
  group_by(name) %>% 
  summarise(count=n(),
            medianDate=median(date),
            life=(max(date)-min(date)),
            wins=sum(num.wins)) %>%
  left_join(reshape2::dcast(df, name ~ meal))
# # A tibble: 3 x 8
#   name  count medianDate life      wins  beef chicken  fish
#   <chr> <int> <date>     <time>   <int> <int>   <int> <int>
# 1 dinah     2 1999-01-13  25 days    19     1       1     0
# 2 lucy      4 1999-03-29  75 days    69     1       1     2
# 3 sora      9 1999-09-08 200 days   101     3       3     3
like image 138
Gregor Thomas Avatar answered Jan 14 '23 08:01

Gregor Thomas


One option is to use table inside summarise as a list column, unnest and then spread it to 'wide' format

library(tidyverse)
df %>% 
  group_by(name) %>%
  summarise(count=n(),
             medianDate=median(date),
             life=(max(date)-min(date)),
             wins=sum(num.wins),
             n = list(enframe(table(meal))) ) %>%
  unnest %>%
  spread(name1, value, fill = 0)
# A tibble: 3 x 8
#  name  count medianDate life      wins  beef chicken  fish
#  <chr> <int> <date>     <time>   <int> <dbl>   <dbl> <dbl>
#1 dinah     2 1999-01-13  25 days    19     1       1     0
#2 lucy      4 1999-03-29  75 days    69     1       1     2
#3 sora      9 1999-09-08 200 days   101     3       3     3
like image 37
akrun Avatar answered Jan 14 '23 08:01

akrun