I have two tables that I need to do a sumif across. Table 1 contains time periods, i.e. year and quarter at year end (i.e. 4, 8, 12 etc.). Table 2 contains the transactions during the year at quarters 3, 6, 7 etc.
I need Table 3 to sum all the transactions during the year so that I get the cumulative position at year end.
Here's some sample code to explain what the data looks like and what the output should look like:
library(data.table)
x1 <- data.table("Name" = "LOB1", "Year" = 2000,
"Quarter" = c(4, 8, 12, 16, 20, 24, 28, 32, 36))
x2 <- data.table("Name" = "LOB1", "Year" = 2000,
"Quarter" = c(3, 6, 7, 9, 11, 14, 16, 20, 24),
"Amount" = c(10000, 15000, -2500, 3500, -6500, 25000,
11000, 9000, 7500))
x3 <- data.table("Name" = "LOB1", "Year" = 2000,
"Quarter" = c(4, 8, 12, 16, 20, 24, 28, 32, 36),
"Amount" = c(10000, 22500, 19500, 55500, 64500, 72000,
72000, 72000, 72000))
I've tried merge, summarise, foverlaps but can't quite figure it out.
Nice question. What basically you are trying to do is to join by Name, Year and Quarter <= Quarter, while summing all the matched Amount values. This is both possible using the new non-equi joins (which were introduced in the latest stable version of data.table v-1.10.0) and foverlaps (while the latter will be probably sub-optimal)
Non-Equi joins:
x2[x1, # for each value in `x1` find all the matching values in `x2`
.(Amount = sum(Amount)), # Sum all the matching values in `Amount`
on = .(Name, Year, Quarter <= Quarter), # join conditions
by = .EACHI] # Do the summing per each match in `i`
# Name Year Quarter Amount
# 1: LOB1 2000 4 10000
# 2: LOB1 2000 8 22500
# 3: LOB1 2000 12 19500
# 4: LOB1 2000 16 55500
# 5: LOB1 2000 20 64500
# 6: LOB1 2000 24 72000
# 7: LOB1 2000 28 72000
# 8: LOB1 2000 32 72000
# 9: LOB1 2000 36 72000
As a side note, you can easily add Amount in place in x1 (proposed by @Frank):
x1[, Amount :=
x2[x1, sum(x.Amount), on = .(Name, Year, Quarter <= Quarter), by = .EACHI]$V1
]
This might be convenient if you have more than just the three join columns in that table.
foverlaps:
You mentioned foverlaps, so in theory you could achieve the same using this function too. Though I'm afraid you will easily get out of memory. Using foverlaps, you will need to create a huge table where each value in x2 joined multiple times to each value in x1 and store everything in memory
x1[, Start := 0] # Make sure that we always join starting from Q0
x2[, Start := Quarter] # In x2 we want to join all possible rows each time
setkey(x2, Name, Year, Start, Quarter) # set keys
## Make a huge cartesian join by overlaps and then aggregate
foverlaps(x1, x2)[, .(Amount = sum(Amount)), by = .(Name, Year, Quarter = i.Quarter)]
# Name Year Quarter Amount
# 1: LOB1 2000 4 10000
# 2: LOB1 2000 8 22500
# 3: LOB1 2000 12 19500
# 4: LOB1 2000 16 55500
# 5: LOB1 2000 20 64500
# 6: LOB1 2000 24 72000
# 7: LOB1 2000 28 72000
# 8: LOB1 2000 32 72000
# 9: LOB1 2000 36 72000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With