Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sumif across two tables?

Tags:

r

data.table

I have two tables that I need to do a sumif across. Table 1 contains time periods, i.e. year and quarter at year end (i.e. 4, 8, 12 etc.). Table 2 contains the transactions during the year at quarters 3, 6, 7 etc.

I need Table 3 to sum all the transactions during the year so that I get the cumulative position at year end.

Here's some sample code to explain what the data looks like and what the output should look like:

library(data.table)

x1 <- data.table("Name" = "LOB1", "Year" = 2000, 
                 "Quarter" = c(4, 8, 12, 16, 20, 24, 28, 32, 36))
x2 <- data.table("Name" = "LOB1", "Year" = 2000, 
                 "Quarter" = c(3, 6, 7, 9, 11, 14, 16, 20, 24), 
                 "Amount" = c(10000, 15000, -2500, 3500, -6500, 25000, 
                              11000, 9000, 7500))
x3 <- data.table("Name" = "LOB1", "Year" = 2000, 
                 "Quarter" = c(4, 8, 12, 16, 20, 24, 28, 32, 36), 
                 "Amount" = c(10000, 22500, 19500, 55500, 64500, 72000, 
                              72000, 72000, 72000))

I've tried merge, summarise, foverlaps but can't quite figure it out.

like image 528
kodfather Avatar asked Jan 16 '17 19:01

kodfather


1 Answers

Nice question. What basically you are trying to do is to join by Name, Year and Quarter <= Quarter, while summing all the matched Amount values. This is both possible using the new non-equi joins (which were introduced in the latest stable version of data.table v-1.10.0) and foverlaps (while the latter will be probably sub-optimal)

Non-Equi joins:

x2[x1, # for each value in `x1` find all the matching values in `x2`
   .(Amount = sum(Amount)), # Sum all the matching values in `Amount`
   on = .(Name, Year, Quarter <= Quarter), # join conditions
   by = .EACHI] # Do the summing per each match in `i`
#    Name Year Quarter Amount
# 1: LOB1 2000       4  10000
# 2: LOB1 2000       8  22500
# 3: LOB1 2000      12  19500
# 4: LOB1 2000      16  55500
# 5: LOB1 2000      20  64500
# 6: LOB1 2000      24  72000
# 7: LOB1 2000      28  72000
# 8: LOB1 2000      32  72000
# 9: LOB1 2000      36  72000

As a side note, you can easily add Amount in place in x1 (proposed by @Frank):

x1[, Amount := 
  x2[x1, sum(x.Amount), on = .(Name, Year, Quarter <= Quarter), by = .EACHI]$V1
]

This might be convenient if you have more than just the three join columns in that table.


foverlaps:

You mentioned foverlaps, so in theory you could achieve the same using this function too. Though I'm afraid you will easily get out of memory. Using foverlaps, you will need to create a huge table where each value in x2 joined multiple times to each value in x1 and store everything in memory

x1[, Start := 0] # Make sure that we always join starting from Q0
x2[, Start := Quarter] # In x2 we want to join all possible rows each time 
setkey(x2, Name, Year, Start, Quarter) # set keys
## Make a huge cartesian join by overlaps and then aggregate
foverlaps(x1, x2)[, .(Amount = sum(Amount)), by = .(Name, Year, Quarter = i.Quarter)]
#    Name Year Quarter Amount
# 1: LOB1 2000       4  10000
# 2: LOB1 2000       8  22500
# 3: LOB1 2000      12  19500
# 4: LOB1 2000      16  55500
# 5: LOB1 2000      20  64500
# 6: LOB1 2000      24  72000
# 7: LOB1 2000      28  72000
# 8: LOB1 2000      32  72000
# 9: LOB1 2000      36  72000
like image 153
David Arenburg Avatar answered Sep 30 '22 17:09

David Arenburg