I have two tables that I need to do a sumif across. Table 1 contains time periods, i.e. year and quarter at year end (i.e. 4
, 8
, 12
etc.). Table 2 contains the transactions during the year at quarters 3
, 6
, 7
etc.
I need Table 3 to sum all the transactions during the year so that I get the cumulative position at year end.
Here's some sample code to explain what the data looks like and what the output should look like:
library(data.table)
x1 <- data.table("Name" = "LOB1", "Year" = 2000,
"Quarter" = c(4, 8, 12, 16, 20, 24, 28, 32, 36))
x2 <- data.table("Name" = "LOB1", "Year" = 2000,
"Quarter" = c(3, 6, 7, 9, 11, 14, 16, 20, 24),
"Amount" = c(10000, 15000, -2500, 3500, -6500, 25000,
11000, 9000, 7500))
x3 <- data.table("Name" = "LOB1", "Year" = 2000,
"Quarter" = c(4, 8, 12, 16, 20, 24, 28, 32, 36),
"Amount" = c(10000, 22500, 19500, 55500, 64500, 72000,
72000, 72000, 72000))
I've tried merge
, summarise
, foverlaps
but can't quite figure it out.
Nice question. What basically you are trying to do is to join by Name
, Year
and Quarter <= Quarter
, while summing all the matched Amount
values. This is both possible using the new non-equi joins (which were introduced in the latest stable version of data.table v-1.10.0) and foverlaps
(while the latter will be probably sub-optimal)
Non-Equi joins:
x2[x1, # for each value in `x1` find all the matching values in `x2`
.(Amount = sum(Amount)), # Sum all the matching values in `Amount`
on = .(Name, Year, Quarter <= Quarter), # join conditions
by = .EACHI] # Do the summing per each match in `i`
# Name Year Quarter Amount
# 1: LOB1 2000 4 10000
# 2: LOB1 2000 8 22500
# 3: LOB1 2000 12 19500
# 4: LOB1 2000 16 55500
# 5: LOB1 2000 20 64500
# 6: LOB1 2000 24 72000
# 7: LOB1 2000 28 72000
# 8: LOB1 2000 32 72000
# 9: LOB1 2000 36 72000
As a side note, you can easily add Amount
in place in x1
(proposed by @Frank):
x1[, Amount :=
x2[x1, sum(x.Amount), on = .(Name, Year, Quarter <= Quarter), by = .EACHI]$V1
]
This might be convenient if you have more than just the three join columns in that table.
foverlaps:
You mentioned foverlaps
, so in theory you could achieve the same using this function too. Though I'm afraid you will easily get out of memory. Using foverlaps
, you will need to create a huge table where each value in x2
joined multiple times to each value in x1
and store everything in memory
x1[, Start := 0] # Make sure that we always join starting from Q0
x2[, Start := Quarter] # In x2 we want to join all possible rows each time
setkey(x2, Name, Year, Start, Quarter) # set keys
## Make a huge cartesian join by overlaps and then aggregate
foverlaps(x1, x2)[, .(Amount = sum(Amount)), by = .(Name, Year, Quarter = i.Quarter)]
# Name Year Quarter Amount
# 1: LOB1 2000 4 10000
# 2: LOB1 2000 8 22500
# 3: LOB1 2000 12 19500
# 4: LOB1 2000 16 55500
# 5: LOB1 2000 20 64500
# 6: LOB1 2000 24 72000
# 7: LOB1 2000 28 72000
# 8: LOB1 2000 32 72000
# 9: LOB1 2000 36 72000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With