Sample data:
Usage = data.table(
feature = 'M11',
startDate = structure(rep(17130, 17155, c(4, 3)), class = "Date"),
cc = 'X6', vendor = 'Z1'
)
Limits = data.table(
vendorId = 'Z1',
featureId = 'M11',
costcenter ='X6', oldLimit = 1:6,
date = structure(17044 + c(91, 61, 30, 0, 105, 75), class = "Date")
)
I am trying to add a column limit to the Usage data.table by looking at the Limits data.table. This is to find out what was the limit for that feature, costCenter, vendor combination at the time of its corresponding usage.
However when I try to roll-join using the below code, I get strange results. I get lot of NAs for my data, so created sample data as above. Below is my roll-join code.
Usage[Limits, limitAtStartDate:= i.oldLimit,
on = c(cc="costcenter", feature="featureId",
vendor="vendorId", startDate="date" ),
roll=TRUE, verbose=TRUE][]
# feature startDate cc vendor limitAtStartDate
# 1: M11 2016-11-25 X6 Z1 6
# 2: M11 2016-11-25 X6 Z1 NA
# 3: M11 2016-11-25 X6 Z1 NA
# 4: M11 2016-11-25 X6 Z1 NA
# 5: M11 2016-12-20 X6 Z1 5
# 6: M11 2016-12-20 X6 Z1 NA
# 7: M11 2016-12-20 X6 Z1 NA
Why is that 5 & 6 are set only for one record for limitAtStartDate?
I am expecting 5 for all rows with date 2016-12-20 and 6 for all 2016-11-25. Please let me know where I am going wrong. I am using data.table version 1.10.0.
When performing an X[Y] join in data.table what you are basically doing is for each value in Y you are trying to find a value in X. Hence, the resulting join will be of length of the Ys table. In your case, you are trying to find a value in Limits for each value in Usage and get a 7 length vector. Hence, you probably should join the other way around and then store it back into Limits
Limits[Usage,
oldLimit,
on = .(costcenter = cc, featureId = feature, vendorId = vendor, date = startDate),
roll = TRUE]
## [1] 6 6 6 6 5 5 5
As a side note, for very (and some times not so) simple cases you could just use findInterval.
setorder(Limits, date)[findInterval(Usage$startDate, date), oldLimit]
## [1] 6 6 6 6 5 5 5
It is a very efficient function that have some caveats though
data.table (e.g. roll = 2 instead of just roll = TRUE)data.table
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With