I have two data tables that I'm trying to merge. One is data on company market values through time and the other is company dividend history through time. I'm trying to find out how much each company has paid each quarter and put that value next to the market value data through time.
library(magrittr)
library(data.table)
library(zoo)
library(lubridate)
set.seed(1337)
# data table of company market values
companies <-
data.table(companyID = 1:10,
Sedol = rep(c("91772E", "7A662B"), each = 5),
Date = (as.Date("2005-04-01") + months(seq(0, 12, 3))) - days(1),
MktCap = c(100 + cumsum(rnorm(5,5)),
50 + cumsum(rnorm(5,1,5)))) %>%
setkey(Sedol, Date)
# data table of dividends
dividends <-
data.table(DivID = 1:7,
Sedol = c(rep('91772E', each = 4), rep('7A662B', each = 3)),
Date = as.Date(c('2004-11-19', '2005-01-13', '2005-01-29',
'2005-10-01', '2005-06-29', '2005-06-30',
'2006-04-17')),
DivAmnt = rnorm(7, .8, .3)) %>%
setkey(Sedol, Date)
I believe this is a situation where you could use a data.table rolling join, something like:
dividends[companies, roll = "nearest"]
to try and get a dataset that looks like
DivID Sedol Date DivAmnt companyID MktCap
1: NA 7A662B <NA> NA 6 61.21061
2: 5 7A662B 2005-06-29 0.7772631 7 66.92951
3: 6 7A662B 2005-06-30 1.1815343 7 66.92951
4: NA 7A662B <NA> NA 8 78.33914
5: NA 7A662B <NA> NA 9 88.92473
6: NA 7A662B <NA> NA 10 87.85067
7: 2 91772E 2005-01-13 0.2964291 1 105.19249
8: 3 91772E 2005-01-29 0.8472649 1 105.19249
9: NA 91772E <NA> NA 2 108.74579
10: 4 91772E 2005-10-01 1.2467408 3 113.42261
11: NA 91772E <NA> NA 4 120.04491
12: NA 91772E <NA> NA 5 124.35588
(note that I've matched the dividends to the company market values by the exact quarter)
But I'm not exactly sure how to execute it. The CRAN pdf is rather vague about what the number is or should be if roll
is a value (Can you pass dates? Does a number quantify the days forward to carry? the number of obersvations?) and changing rollends
around doesn't seem to get me what I want.
In the end, I ended up mapping the dividend dates to their quarter end and then joining on that. A good solution, but not useful if I end up needing to know how to perform rolling joins. In your answer, could you describe a situation where rolling joins are the only solution as well as help me understand how to perform them?
Instead of a rolling join, you may want to use an overlap join with the foverlaps
function of data.table:
# create an interval in the 'companies' datatable
companies[, `:=` (start = compDate - days(90), end = compDate + days(15))]
# create a second date in the 'dividends' datatable
dividends[, Date2 := divDate]
# set the keys for the two datatable
setkey(companies, Sedol, start, end)
setkey(dividends, Sedol, divDate, Date2)
# create a vector of columnnames which can be removed afterwards
deletecols <- c("Date2","start","end")
# perform the overlap join and remove the helper columns
res <- foverlaps(companies, dividends)[, (deletecols) := NULL]
the result:
> res Sedol DivID divDate DivAmnt companyID compDate MktCap 1: 7A662B NA <NA> NA 6 2005-03-31 61.21061 2: 7A662B 5 2005-06-29 0.7772631 7 2005-06-30 66.92951 3: 7A662B 6 2005-06-30 1.1815343 7 2005-06-30 66.92951 4: 7A662B NA <NA> NA 8 2005-09-30 78.33914 5: 7A662B NA <NA> NA 9 2005-12-31 88.92473 6: 7A662B NA <NA> NA 10 2006-03-31 87.85067 7: 91772E 2 2005-01-13 0.2964291 1 2005-03-31 105.19249 8: 91772E 3 2005-01-29 0.8472649 1 2005-03-31 105.19249 9: 91772E NA <NA> NA 2 2005-06-30 108.74579 10: 91772E 4 2005-10-01 1.2467408 3 2005-09-30 113.42261 11: 91772E NA <NA> NA 4 2005-12-31 120.04491 12: 91772E NA <NA> NA 5 2006-03-31 124.35588
In the meantime the data.table authors have introduced non-equi joins (v1.9.8). You can also use that to solve this problem. Using a non-equi join you just need:
companies[, `:=` (start = compDate - days(90), end = compDate + days(15))]
dividends[companies, on = .(Sedol, divDate >= start, divDate <= end)]
to get the intended result.
Used data (the same as in the question, but without the creation of the keys):
set.seed(1337)
companies <- data.table(companyID = 1:10, Sedol = rep(c("91772E", "7A662B"), each = 5),
compDate = (as.Date("2005-04-01") + months(seq(0, 12, 3))) - days(1),
MktCap = c(100 + cumsum(rnorm(5,5)), 50 + cumsum(rnorm(5,1,5))))
dividends <- data.table(DivID = 1:7, Sedol = c(rep('91772E', each = 4), rep('7A662B', each = 3)),
divDate = as.Date(c('2004-11-19','2005-01-13','2005-01-29','2005-10-01','2005-06-29','2005-06-30','2006-04-17')),
DivAmnt = rnorm(7, .8, .3))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With