I have a dataframe that is the start and end times for a series of vectors. So we have a bunch of x vectors and y vectors and I want to compare the minimum distance between the 2 vectors. If two vectors have any overlapping parts then the min distance is 0 (in this application you can't have a negative distance).
Here's what the dataframe looks like (below is an easy way to grab it):
x.start x.end y.start y.end
1 3 6 7 8
2 10 14 19 22
3 19 25 45 45
4 33 33 66 68
5 100 101 90 101
6 130 150 134 153
So I want to go row by row on the x vectors and for each x vector compare it to all y vectors and find the minimum distance between the two.
Below I accomplish this with a nested for
loop but I need to repeat this many times with more vectors so speed counts. This is slow. What is the most efficient way to accomplish this task?
Desired output:
## > out
## [1] 1 2 0 11 0 0
I'd prefer to keep this in base R but if you have a faster way that is OS independent I'm open.
The data:
dat <- data.frame(
x.start = c(3, 10, 19, 33, 100, 130),
x.end = c(6, 14, 25, 33, 101, 150),
y.start = c(7, 19, 45, 66, 90, 134),
y.end = c(8, 22, 45, 68, 101, 153)
)
Note that looking at my answer below may provide better understanding of the task. I'll benchmark the results after a few competitors have risen.
Here is the desired output as a dataframe for easier comparison and understanding:
min_dist x.start x.end y.start y.end
1 1 3 6 7 8
2 2 10 14 19 22
3 0 19 25 45 45
4 11 33 33 66 68
5 0 100 101 90 101
6 0 130 150 134 153
A visualization of the 2 sets of vectors:
So each of the red segments I'd like to know the minimum distance to the closest y vector (blue segments); though I see the x vector 33:33 and the y vector 45:45 don't show up but I think this gives a visual depiction of the problem.
Benchmarking Results: Running Tally
Unit: microseconds
expr min lq median uq max neval
GEEKTRADER() 5386.186 5553.659 5603.341 5678.214 68297.171 5000
TRINKER() 1421.887 1480.198 1496.992 1517.985 63619.596 5000
RICARDO_OPT1() 4748.483 4892.631 4974.968 5110.952 156400.446 5000
RICARDO_OPT2() 7387.463 7583.859 7694.418 7845.564 70200.949 5000
FOTNELTON() 437.576 462.767 473.963 486.091 6109.724 5000
FOTNELTON_EDIT() 356.871 379.730 390.460 402.122 3576.174 5000
RICARDO_SIMPLE_ANS() 801.444 842.496 855.091 870.952 3923.715 5000
ALEXIS() 343.343 385.328 397.923 408.652 4169.093 5000
I think the easiest and possibly fastest way to do it is as follows:
apply(dat, 1, function(d) {
overlap <- (dat$y.end >= d[1] & dat$y.end <= d[2]) |
(dat$y.start >= d[1] & dat$y.start <= d[2])
if (any(overlap)) 0
else min(abs(c(d[1] - dat$y.end[!overlap], dat$y.start[!overlap] - d[2])))
})
EDIT: overlap
can be much simpler:
apply(dat, 1, function(d) {
overlap <- dat$y.end >= d[1] & dat$y.start <= d[2]
if (any(overlap)) 0
else min(abs(c(d[1] - dat$y.end[!overlap], dat$y.start[!overlap] - d[2])))
})
Not sure if this is fastest. But here is one way to do it.
apply(dat[,1:2], MARGIN=1, FUN=function(x) {
min(apply(dat[,3:4], MARGIN = 1, FUN = function(y){
X <- c(t(x))
Y <- c(t(y))
#Check if the two line segments overlap else find minimum distance between the 2 edges of each line segments
if (diff(range(c(X,Y))) <= diff(X) + diff(Y)){
return(0)
} else {
return(min(abs(outer(Y, X, "-"))))
}
}))
})
## [1] 1 2 0 11 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With