I have 2 lists with X,Y coordinates of points. List 1 contains more points than list 2.
The task is to find pairs of points in a way that the overall euclidean distance is minimized.
I have a working code, but i don't know if this is the best way and I would like to get hint what I can improve for result (better algorithm to find the minimum ) or speed, because the list are about 2000 elements each.
The round in the sample vectors is implemented to get also points with same distances. With the "rdist" function all distances are generated in "distances". Than the minimum in the matrix is used to link 2 point ("dist_min"). All distances of these 2 points are now replaced by NA and the loop continues by searching the next minimum until all points of list 2 have a point from list 1. At the end I have added a plot for visualization.
require(fields)
set.seed(1)
x1y1.data <- matrix(round(runif(200*2),2), ncol = 2) # generate 1st set of points
x2y2.data <- matrix(round(runif(100*2),2), ncol = 2) # generate 2nd set of points
distances <- rdist(x1y1.data, x2y2.data)
dist_min <- matrix(data=NA,nrow=ncol(distances),ncol=7) # prepare resulting vector with 7 columns
for(i in 1:ncol(distances))
{
inds <- which(distances == min(distances,na.rm = TRUE), arr.ind=TRUE)
dist_min[i,1] <- inds[1,1] # row of point(use 1st element of inds if points have same distance)
dist_min[i,2] <- inds[1,2] # column of point (use 1st element of inds if points have same distance)
dist_min[i,3] <- distances[inds[1,1],inds[1,2]] # distance of point
dist_min[i,4] <- x1y1.data[inds[1,1],1] # X1 ccordinate of 1st point
dist_min[i,5] <- x1y1.data[inds[1,1],2] # Y1 coordinate of 1st point
dist_min[i,6] <- x2y2.data[inds[1,2],1] # X2 coordinate of 2nd point
dist_min[i,7] <- x2y2.data[inds[1,2],2] # Y2 coordinate of 2nd point
distances[inds[1,1],] <- NA # remove row (fill with NA), where minimum was found
distances[,inds[1,2]] <- NA # remove column (fill with NA), where minimum was found
}
# plot 1st set of points
# print mean distance as measure for optimization
plot(x1y1.data,col="blue",main="mean of min_distances",sub=mean(dist_min[,3],na.rm=TRUE))
points(x2y2.data,col="red") # plot 2nd set of points
segments(dist_min[,4],dist_min[,5],dist_min[,6],dist_min[,7]) # connect pairwise according found minimal distance
This is a fundamental problem in combinatorial optimization known as the assignment problem. One approach to solving the assignment problem is the Hungarian algorithm which is implemented in the R package clue:
require(clue)
sol <- solve_LSAP(t(distances))
We can verify that it outperforms the naive solution:
mean(dist_min[,3])
# [1] 0.05696033
mean(sqrt(
(x2y2.data[,1] - x1y1.data[sol, 1])^2 +
(x2y2.data[,2] - x1y1.data[sol, 2])^2))
#[1] 0.05194625
And we can construct a similar plot to the one in your question:
plot(x1y1.data,col="blue")
points(x2y2.data,col="red")
segments(x2y2.data[,1], x2y2.data[,2], x1y1.data[sol, 1], x1y1.data[sol, 2])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With