I have the following data.table, dtgrouped2
:
MonthNo Unique Total
1: 1 AAA 10
2: 1 BBB 0
3: 2 CCC 3
4: 2 DDD 0
5: 3 AAA 0
6: 3 BBB 35
7: 4 CCC 15
8: 4 AAA 0
9: 5 BBB 60
10: 5 CCC 0
11: 6 DDD 100
12: 6 AAA 0
And another table, dt2
:
Unique1 StartDate EndDate Amount1 Amount2
1: AAA 0 1 7 0
3: AAA 1 2 5 0
2: AAA 2 4 3 2
I want to insert Amount1 and Amount2 from dt2
to dtgrouped2
based on the following logic on "Unique" evaluated for each row of dtgrouped2
:
filter(StartDate< MonthNo & EndDate>=MonthNo)
then MAX(EndDate)
then insert Amount1 as Amount1 and Amount2 as Amount2
So you can see that the result is different depending on the row. This would be the expected output:
Date MonthNo Unique Items Amounts Amount1 Amount2
Jan 1 AAA x 10 7 0
Jan 1 BBB y 2 NA NA
Feb 2 CCC x 3 NA NA
Feb 2 DDD y 15 NA NA
March 3 AAA y 20 3 2
March 3 BBB x 35 NA NA
April 4 CCC x 15 NA NA
April 4 AAA y 50 3 2
May 5 BBB x 60 NA NA
May 5 CCC y 70 NA NA
June 6 DDD x 100 NA NA
June 6 AAA y 20 NA NA
I suggest to use non-equi joins combined with mult = "last"
(in order to capture only the most recent EndDate
)
dtgrouped2[, c("Amount1", "Amount2") := # Assign the below result to new columns in dtgrouped2
dt2[dtgrouped2, # join
.(Amount1, Amount2), # get the column you need
on = .(Unique1 = Unique, # join conditions
StartDate < MonthNo,
EndDate >= MonthNo),
mult = "last"]] # get always the latest EndDate
dtgrouped2
# MonthNo Unique Total Amount1 Amount2
# 1: 1 AAA 10 7 0
# 2: 1 BBB 0 NA NA
# 3: 2 CCC 3 NA NA
# 4: 2 DDD 0 NA NA
# 5: 3 AAA 0 3 2
# 6: 3 BBB 35 NA NA
# 7: 4 CCC 15 NA NA
# 8: 4 AAA 0 3 2
# 9: 5 BBB 60 NA NA
# 10: 5 CCC 0 NA NA
# 11: 6 DDD 100 NA NA
# 12: 6 AAA 0 NA NA
The reason that you would need to join dt2[dtgrouped]
first (and not the other way around) is because you want to join dt2
for each possible value in dtgrouped
, hence allow multiple values in dt2
to be joined to dtgrouped
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With