I have two vectors of integers, and for each element of the second vector I want to find the minumum distance to any element of the first vector - for example
obj1 <- seq(0, 1000, length.out=11)
obj2 <- 30:50
min_diff <- sapply(obj2, function(x) min(abs(obj1-x)))
min_diff
returns
[1] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Is there a more efficient way? I want to scale this up to thousands (millions?) of both obj1 & obj2.
Thanks, Aaron
I would use a step function sorted on the first vector. This will avoid loops and is pretty fast in R.
x <- rnorm(1000)
y <- rnorm(1000)
sorted.x <- sort(x)
myfun <- stepfun(sorted.x, 0:length(x))
Now myfun(1)
will give you the index of the largest element of sorted.x
whose value is less than 1
. In my case,
> myfun(1)
[1] 842
> sorted.x[842]
[1] 0.997574
> sorted.x[843]
[1] 1.014771
So you know that the closest element is either sorted.x[myfun(1)]
or sorted.x[myfun(1) + 1]
. Consequently (and padding for 0),
indices <- pmin(pmax(1, myfun(y)), length(sorted.x) - 1)
mindist <- pmin(abs(y - sorted.x[indices]), abs(y - sorted.x[indices + 1]))
start by sorting obj1
then you can do a binary search in obj1 for each element of obj2. knowing where the element would be, you can compare the distance to the two nearby elements of obj1, giving you the minimum distance.
runtime (where n1 = |obj1| and n2 = |obj2|): (n1 + n2) log (n1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With