I am trying to get each of my id/year/month rows to have all rows corresponding to all seven weekdays with NAs for 'missing weekdays.'
Here is the data frame and my attempt at achieving this task:
> df
id year month weekday amount
1 1 2015 1 Friday 3650.43
2 2 2015 1 Monday 1271.12
3 1 2015 2 Friday 1315.79
4 2 2015 2 Monday 2195.37
> wday
weekday
1 Friday
2 Saturday
3 Wednesday
4 Sunday
5 Tuesday
6 Monday
7 Thursday
Tried to use group_by() and the right join. But, it is not producing what I thought it would. Is there a simple way to achieve the result I am after?
> df <- df %>% group_by(id, year, month) %>% right_join(wday)
Joining by: "weekday"
> df
Source: local data frame [9 x 5]
Groups: id, year, month [?]
id year month weekday amount
(dbl) (int) (int) (chr) (dbl)
1 1 2015 1 Friday 3650.43
2 1 2015 2 Friday 1315.79
3 NA NA NA Saturday NA
4 NA NA NA Wednesday NA
5 NA NA NA Sunday NA
6 NA NA NA Tuesday NA
7 2 2015 1 Monday 1271.12
8 2 2015 2 Monday 2195.37
9 NA NA NA Thursday NA
I want 7 rows per id/year/month combination where amount for missing weekdays will be NA (or zeroes ideally, but I know how to get that by mutate()).
Resulting data frame should look like this:
> df
id year month weekday amount
1 1 2015 1 Friday 3650.43
2 1 2015 1 Monday 0.00
3 1 2015 1 Saturday 0.00
4 1 2015 1 Sunday 0.00
5 1 2015 1 Thursday 0.00
6 1 2015 1 Tuesday 0.00
7 1 2015 1 Wednesday 0.00
8 1 2015 2 Friday 1315.79
9 1 2015 2 Monday 0.00
10 1 2015 2 Saturday 0.00
11 1 2015 2 Sunday 0.00
12 1 2015 2 Thursday 0.00
13 1 2015 2 Tuesday 0.00
14 1 2015 2 Wednesday 0.00
15 2 2015 1 Friday 0.00
16 2 2015 1 Monday 1271.12
17 2 2015 1 Saturday 0.00
18 2 2015 1 Sunday 0.00
19 2 2015 1 Thursday 0.00
20 2 2015 1 Tuesday 0.00
21 2 2015 1 Wednesday 0.00
22 2 2015 2 Friday 0.00
23 2 2015 2 Monday 2195.37
24 2 2015 2 Saturday 0.00
25 2 2015 2 Sunday 0.00
26 2 2015 2 Thursday 0.00
27 2 2015 2 Tuesday 0.00
28 2 2015 2 Wednesday 0.00
Using tidyr
and dplyr
. complete
here does the heavy lifting - if you already have each weekday somewhere in df, you won't need the bind_rows
or na.omit
(or dplyr).
library(dplyr)
library(tidyr)
df %>% #initial data
bind_rows(wday) %>% #adding on so we have all the weekdays
complete(id, year, month, weekday, #completing all levels of id:year:month:weekday
fill = list(amount = 0)) %>% #filling amount column with 0
na.omit() #remove the NAs we got from the bind_rows
We can use expand.grid
expand.grid(c(lapply(df[1:3], unique), wday['weekday'])) %>%
left_join(., df) %>%
mutate(amount=replace(amount, is.na(amount), 0)) %>%
arrange(id, year, month, weekday)
# id year month weekday amount
#1 1 2015 1 Friday 3650.43
#2 1 2015 1 Monday 0.00
#3 1 2015 1 Saturday 0.00
#4 1 2015 1 Sunday 0.00
#5 1 2015 1 Thursday 0.00
#6 1 2015 1 Tuesday 0.00
#7 1 2015 1 Wednesday 0.00
#8 1 2015 2 Friday 1315.79
#9 1 2015 2 Monday 0.00
#10 1 2015 2 Saturday 0.00
#11 1 2015 2 Sunday 0.00
#12 1 2015 2 Thursday 0.00
#13 1 2015 2 Tuesday 0.00
#14 1 2015 2 Wednesday 0.00
#15 2 2015 1 Friday 0.00
#16 2 2015 1 Monday 1271.12
#17 2 2015 1 Saturday 0.00
#18 2 2015 1 Sunday 0.00
#19 2 2015 1 Thursday 0.00
#20 2 2015 1 Tuesday 0.00
#21 2 2015 1 Wednesday 0.00
#22 2 2015 2 Friday 0.00
#23 2 2015 2 Monday 2195.37
#24 2 2015 2 Saturday 0.00
#25 2 2015 2 Sunday 0.00
#26 2 2015 2 Thursday 0.00
#27 2 2015 2 Tuesday 0.00
#28 2 2015 2 Wednesday 0.00
sqldf For complex joins it is usually easier to use SQL:
library(sqldf)
sqldf("select
id,
year,
month,
wday.weekday,
sum((df.weekday = wday.weekday) * amount) amount
from df
join wday
group by 1, 2, 3, 4")
giving:
id year month weekday amount
1 1 2015 1 Friday 3650.43
2 1 2015 1 Saturday 0.00
3 1 2015 1 Wednesday 0.00
4 1 2015 1 Sunday 0.00
5 1 2015 1 Tuesday 0.00
6 1 2015 1 Monday 0.00
7 1 2015 1 Thursday 0.00
8 2 2015 1 Friday 0.00
9 2 2015 1 Saturday 0.00
10 2 2015 1 Wednesday 0.00
11 2 2015 1 Sunday 0.00
12 2 2015 1 Tuesday 0.00
13 2 2015 1 Monday 1271.12
14 2 2015 1 Thursday 0.00
15 1 2015 2 Friday 1315.79
16 1 2015 2 Saturday 0.00
17 1 2015 2 Wednesday 0.00
18 1 2015 2 Sunday 0.00
19 1 2015 2 Tuesday 0.00
20 1 2015 2 Monday 0.00
21 1 2015 2 Thursday 0.00
22 2 2015 2 Friday 0.00
23 2 2015 2 Saturday 0.00
24 2 2015 2 Wednesday 0.00
25 2 2015 2 Sunday 0.00
26 2 2015 2 Tuesday 0.00
27 2 2015 2 Monday 2195.37
28 2 2015 2 Thursday 0.00
base R We could replicate this in base R using merge
and transform
:
xt <- transform(
merge(df, wday, by = c()),
amount = (as.character(weekday.x) == as.character(weekday.y)) * amount,
weekday = weekday.y,
weekday.x = NULL,
weekday.y = NULL
))
aggregate(amount ~., xt, sum)
dplyr and if we really wanted to use dplyr we could replace the transform
with mutate
, rename
and select
:
library(dplyr)
merge(df, wday, by = c()) %>%
mutate(amount = (as.character(weekday.x) == as.character(weekday.y)) * amount) %>%
rename(weekday = weekday.y) %>%
select(-weekday.x) %>%
group_by(id, year, month, weekday) %>%
summarise(amount = sum(amount))
Note: If there is only one weekday per group (as in the question) we could optionally omit group by/sum, aggregate and group_by/summarise in the three solutions respectively.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With