Per row in a data.table, I need to find the closest lower number from a vector. The following minimum working example does the job, but is much too slow, especially for longer pre.numbers vectors (about 1 million elements in the real data).
library(data.table)
set.seed(2)
pre.numbers <- sort(floor(runif(50000, 1, 1000000)))
the.table <- data.table(cbind(rowid=1:10000, current.number=floor(runif(1000, 1, 100000)), closest.lower.number=NA_integer_))
setkey(the.table, rowid)
the.table[, closest.lower.number:=max(pre.numbers[pre.numbers<current.number]), by=rowid]
There must be a smarter way to do it. There is no relationship between the vector-numbers and the numbers in the data.table.
How about this? Using data.table's rolling joins:
DT = data.table(pre = pre.numbers,
current.number = pre.numbers+0.5, key="current.number")
setkey(the.table, current.number)
ans = DT[the.table, roll=Inf, rollends=FALSE]
Since you're dealing with integers, I've just added 0.5 (any number between 0 and 1 should be fine) to create DT from pre.numbers.
The last step performs a LOCF rolling join (last observation carried forward). For each value of current.number (key column), matching rows are looked for in DT's current.number (key column). If there's no match, the last observation is rolled forward. And if the match occurs at the start/end, it results in NA (rollends = FALSE).
To illustrate better as to what happens, consider this case:
# pre.numbers:
# c(10, 11)
# the.table:
# current.numbers
# 9
# 10
# 11
We convert pre.numbers to DT first, which'll result in the columns
# DT:
# pre current.numbers (key col)
# 10 10.5
# 11 11.5
For each value in the.table:
# 9 -> falls before 10.5, LOCF and rollends = FALSE => result is NA
# 10 -> falls before 10.5 => same as above
# 11 -> falls between 10.5 and 11.5, LOCF matches with previous row = 10.5
# corresponding pre = 10.
HTH
Here's the code I used to generate the data:
require(data.table)
set.seed(1L)
pre.numbers = sort(floor(runif(50000, 1, 1000000)))
the.table = data.table(rowid=1:10000, current.number=floor(runif(1000, 1, 100000)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With