I have the following data.table, dtgrouped2:
MonthNo Unique Total
1: 1 AAA 10
2: 1 BBB 0
3: 2 CCC 3
4: 2 DDD 0
5: 3 AAA 0
6: 3 BBB 35
7: 4 CCC 15
8: 4 AAA 0
9: 5 BBB 60
10: 5 CCC 0
11: 6 DDD 100
12: 6 AAA 0
And another table, dt2:
Unique1 StartDate EndDate Amount1 Amount2
1: AAA 0 1 7 0
3: AAA 1 2 5 0
2: AAA 2 4 3 2
I want to insert Amount1 and Amount2 from dt2 to dtgrouped2 based on the following logic on "Unique" evaluated for each row of dtgrouped2:
filter(StartDate< MonthNo & EndDate>=MonthNo)
then MAX(EndDate)
then insert Amount1 as Amount1 and Amount2 as Amount2
So you can see that the result is different depending on the row. This would be the expected output:
Date MonthNo Unique Items Amounts Amount1 Amount2
Jan 1 AAA x 10 7 0
Jan 1 BBB y 2 NA NA
Feb 2 CCC x 3 NA NA
Feb 2 DDD y 15 NA NA
March 3 AAA y 20 3 2
March 3 BBB x 35 NA NA
April 4 CCC x 15 NA NA
April 4 AAA y 50 3 2
May 5 BBB x 60 NA NA
May 5 CCC y 70 NA NA
June 6 DDD x 100 NA NA
June 6 AAA y 20 NA NA
I suggest to use non-equi joins combined with mult = "last" (in order to capture only the most recent EndDate)
dtgrouped2[, c("Amount1", "Amount2") := # Assign the below result to new columns in dtgrouped2
dt2[dtgrouped2, # join
.(Amount1, Amount2), # get the column you need
on = .(Unique1 = Unique, # join conditions
StartDate < MonthNo,
EndDate >= MonthNo),
mult = "last"]] # get always the latest EndDate
dtgrouped2
# MonthNo Unique Total Amount1 Amount2
# 1: 1 AAA 10 7 0
# 2: 1 BBB 0 NA NA
# 3: 2 CCC 3 NA NA
# 4: 2 DDD 0 NA NA
# 5: 3 AAA 0 3 2
# 6: 3 BBB 35 NA NA
# 7: 4 CCC 15 NA NA
# 8: 4 AAA 0 3 2
# 9: 5 BBB 60 NA NA
# 10: 5 CCC 0 NA NA
# 11: 6 DDD 100 NA NA
# 12: 6 AAA 0 NA NA
The reason that you would need to join dt2[dtgrouped] first (and not the other way around) is because you want to join dt2 for each possible value in dtgrouped, hence allow multiple values in dt2 to be joined to dtgrouped
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With