Sample data:
Usage = data.table(
feature = 'M11',
startDate = structure(rep(17130, 17155, c(4, 3)), class = "Date"),
cc = 'X6', vendor = 'Z1'
)
Limits = data.table(
vendorId = 'Z1',
featureId = 'M11',
costcenter ='X6', oldLimit = 1:6,
date = structure(17044 + c(91, 61, 30, 0, 105, 75), class = "Date")
)
I am trying to add a column limit
to the Usage
data.table
by looking at the Limits
data.table
. This is to find out what was the limit for that feature
, costCenter
, vendor
combination at the time of its corresponding usage.
However when I try to roll-join using the below code, I get strange results. I get lot of NA
s for my data, so created sample data as above. Below is my roll-join code.
Usage[Limits, limitAtStartDate:= i.oldLimit,
on = c(cc="costcenter", feature="featureId",
vendor="vendorId", startDate="date" ),
roll=TRUE, verbose=TRUE][]
# feature startDate cc vendor limitAtStartDate
# 1: M11 2016-11-25 X6 Z1 6
# 2: M11 2016-11-25 X6 Z1 NA
# 3: M11 2016-11-25 X6 Z1 NA
# 4: M11 2016-11-25 X6 Z1 NA
# 5: M11 2016-12-20 X6 Z1 5
# 6: M11 2016-12-20 X6 Z1 NA
# 7: M11 2016-12-20 X6 Z1 NA
Why is that 5
& 6
are set only for one record for limitAtStartDate
?
I am expecting 5
for all rows with date 2016-12-20
and 6
for all 2016-11-25
. Please let me know where I am going wrong. I am using data.table
version 1.10.0
.
When performing an X[Y]
join in data.table
what you are basically doing is for each value in Y
you are trying to find a value in X
. Hence, the resulting join will be of length of the Y
s table. In your case, you are trying to find a value in Limits
for each value in Usage
and get a 7 length vector. Hence, you probably should join the other way around and then store it back into Limits
Limits[Usage,
oldLimit,
on = .(costcenter = cc, featureId = feature, vendorId = vendor, date = startDate),
roll = TRUE]
## [1] 6 6 6 6 5 5 5
As a side note, for very (and some times not so) simple cases you could just use findInterval
.
setorder(Limits, date)[findInterval(Usage$startDate, date), oldLimit]
## [1] 6 6 6 6 5 5 5
It is a very efficient function that have some caveats though
data.table
(e.g. roll = 2
instead of just roll = TRUE
)data.table
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With