The main problem is described in the question title. Going straight to the example below.
I have two datasets:
library(data.table)
dt1 <- data.table(date = as.Date("2015-06-28")+c(0L,3L,5L,7L),
key="date")
dt2 <- data.table(date = as.Date("2015-06-30")+c(0:1,4L),
val = letters[7:9],
dummy = rep(NA,3),
key="date")
I want to have val
column from dt2
added to dt1
using rolling join.
The following statement will produce similar output to the expected one:
dt2[dt1, roll=TRUE]
# date val dummy
# 1: 2015-06-28 NA NA
# 2: 2015-07-01 h NA
# 3: 2015-07-03 h NA
# 4: 2015-07-05 i NA
The are two problems with that statement:
1. I didn't want to have dummy
column
2. I want to do this by reference:
address(dt1)
# [1] "0x3b57540"
address(dt2[dt1, roll=TRUE])
# [1] "0x3b4e1f0"
So, I'm looking for rolling join and add column by reference for my dt1
and dt2
, expected output:
# date val
# 1: 2015-06-28 NA
# 2: 2015-07-01 h
# 3: 2015-07-03 h
# 4: 2015-07-05 i
And of course address(dt1)
should match to address
of the magic statement.
Here's a method that should scale.
address(dt1)
# [1] "0x265a060"
ix = dt2[dt1, roll=TRUE, which=TRUE]
dt1[, val := dt2[ix, val]]
dt1
# date val
# 1: 2015-06-28 NA
# 2: 2015-07-01 h
# 3: 2015-07-03 h
# 4: 2015-07-05 i
address(dt1)
# [1] "0x265a060"
Or without creating the index, directly as:
dt1[, val := dt2[dt1, val, roll = TRUE]] ## (1)
Note that this is more efficient than doing:
dt1[, val := dt2[dt1, roll = TRUE]$val] ## (2)
(2) performs the entire join (materialises all the columns) and then extracts val
, where as (1) directly just extracts column val
.
Here's a workaround; not perfect because it still involves creating extra memory, though there may be a way to avoid this that's not coming to me right away (anyway the extra memory may be minimal):
> address(dt1)
[1] "0x57b5230"
rng<-range(dt1[,range(date)],dt2[,range(date)])
x<-data.table(date=seq(from=rng[1],to=rng[2],by="day"),
key="date")
> address(x)
[1] "0x6aa2df0"
x[dt2,setdiff(names(dt2),"date"):=mget(setdiff(names(dt2),"date"))
][,val:=zoo::na.locf(val,na.rm=F)]
> address(x)
[1] "0x6aa2df0"
> dt1[x,val:=i.val][]
date val
1: 2015-06-28 NA
2: 2015-07-01 h
3: 2015-07-03 h
4: 2015-07-05 i
> address(dt1)
[1] "0x57b5230"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With