Given the following tables:
table A (id, latitude, longitude)
table B (id, latitude, longitude)
how do I build an efficient T-SQL query that associates each row in A with the closest row in B?
The ResultSet should contains all the rows in A and associate them with 1 and only 1 element in B. The format that I'm looking for is the following:
(A.id, B.id, distanceAB)
I have a function that calculates the distance given 2 pairs of latitude and longitude. I tried something using order by ... limit 1
and/or rank() over (partition by ...) as rowCount ... where rowCount = 1
but the result is either not really what I need or it takes too long to return.
Am I missing something?
There's no way to get around the fact that you're going to have to compare every record in A with every record in B, which is obviously going to scale poorly if both A and B contain a lot of records.
That being said, this will return correct results:
SELECT aid, bid, distanceAB
FROM (
SELECT aid, bid, distanceAB,
dense_rank() over (partition by aid order by distanceAB) as n
FROM (
SELECT a.id as aid, B.id as bid,
acos(sin(radians(A.lat)) * sin(radians(B.lat)) +
cos(radians(A.lat)) * cos(radians(B.lat)) *
cos(radians(A.lon - B.lon))) * 6372.8 as distanceAB
FROM A cross join B
) C
) D
WHERE n = 1
This will return in a reasonable amount of time if your sets aren't too large. With 3 locations in A and 130,000 or so in B, it takes about one second on my machine. 1,000 records in each takes about 40s. Like I said, it scales poorly.
It should be noted that Sparky's answer can return incorrect results under certain circumstances. Suppose your A location is at +40,+100. +40,+111 would not be returned, even though it's closer than +49,+109.
This is one approach that should have deceent performance, but a big caveat is that it might not find any results
select top 1 a.id,b.id,dbo.yourFunction() as DistanceAB
from a
join b on b.latitude between a.latitude-10 and a.latitude+10 and
b.longititude between a.longitude-10 and b.longittude+10
order by 3
What you are basically doing is looking for any B row within roughly a 20 unit radius of A and then sorting it by your function to determine the closest. You can adjust the unit radius as needed. While it is not exact, it should reduce the size of the result set and should give you decent performance results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With