I believe this may have a simple solution but I'm having trouble describing what I need to do (and hence what to search for). I think I need the summarize
function. My goal output is at the very bottom.
I'm trying to count the occurrences of a value between each unique value in another column. Here is an example df
that hopefully illustrates what I need todo.
library(dplyr)
set.seed(1)
df <- tibble("name" = c(rep("dinah",2),rep("lucy",4),rep("sora",9)),
"meal" = c(rep(c("chicken","beef","fish"),5)),
"date" = seq(as.Date("1999/1/1"),as.Date("2000/1/1"),25),
"num.wins" = sample(0:30)[1:15])
Among other things, I'm trying to summarize (sum) the types of meals each name had using this data.
df
# A tibble: 15 x 4
name meal date num.wins
<chr> <chr> <date> <int>
1 dinah chicken 1999-01-01 8
2 dinah beef 1999-01-26 11
3 lucy fish 1999-02-20 16
4 lucy chicken 1999-03-17 25
5 lucy beef 1999-04-11 5
6 lucy fish 1999-05-06 23
7 sora chicken 1999-05-31 27
8 sora beef 1999-06-25 15
9 sora fish 1999-07-20 14
10 sora chicken 1999-08-14 1
11 sora beef 1999-09-08 4
12 sora fish 1999-10-03 3
13 sora chicken 1999-10-28 13
14 sora beef 1999-11-22 6
15 sora fish 1999-12-17 18
I've made progress with other calculations I'm interested in, below:
df %>%
group_by(name) %>%
summarise(count=n(),
medianDate=median(date),
life=(max(date)-min(date)),
wins=sum(num.wins))
# A tibble: 3 x 5
name count medianDate life wins
<chr> <int> <date> <time> <int>
1 dinah 2 1999-01-13 25 days 19
2 lucy 4 1999-03-29 75 days 69
3 sora 9 1999-09-08 200 days 101
My goal is to add an additional column for each type of food, and have the sum of the occurrences of that food displayed in each row, like so:
name count medianDate life wins chicken beef fish
1 dinah 2 1999-01-13 25 days 19 1 1 0
2 lucy 4 1999-03-29 75 days 69 1 1 2
3 sora 9 1999-09-08 200 days 101 3 3 3
Though older, and possibly on a deprecation path, reshape2::dcast
does this nicely:
reshape2::dcast(df, name ~ meal)
# name beef chicken fish
# 1 dinah 1 1 0
# 2 lucy 1 1 2
# 3 sora 3 3 3
You can understand the formula as rows ~ columns
. By default, it will aggregate the values in the columns using the length
function---which gives exactly what you want, the count of each.
This can be easily joined to your summary data:
df %>%
group_by(name) %>%
summarise(count=n(),
medianDate=median(date),
life=(max(date)-min(date)),
wins=sum(num.wins)) %>%
left_join(reshape2::dcast(df, name ~ meal))
# # A tibble: 3 x 8
# name count medianDate life wins beef chicken fish
# <chr> <int> <date> <time> <int> <int> <int> <int>
# 1 dinah 2 1999-01-13 25 days 19 1 1 0
# 2 lucy 4 1999-03-29 75 days 69 1 1 2
# 3 sora 9 1999-09-08 200 days 101 3 3 3
One option is to use table
inside summarise
as a list
column, unnest
and then spread
it to 'wide' format
library(tidyverse)
df %>%
group_by(name) %>%
summarise(count=n(),
medianDate=median(date),
life=(max(date)-min(date)),
wins=sum(num.wins),
n = list(enframe(table(meal))) ) %>%
unnest %>%
spread(name1, value, fill = 0)
# A tibble: 3 x 8
# name count medianDate life wins beef chicken fish
# <chr> <int> <date> <time> <int> <dbl> <dbl> <dbl>
#1 dinah 2 1999-01-13 25 days 19 1 1 0
#2 lucy 4 1999-03-29 75 days 69 1 1 2
#3 sora 9 1999-09-08 200 days 101 3 3 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With