Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distance-based JOIN given Latitude/Longitude

Given the following tables:

table A (id, latitude, longitude)
table B (id, latitude, longitude)

how do I build an efficient T-SQL query that associates each row in A with the closest row in B?

The ResultSet should contains all the rows in A and associate them with 1 and only 1 element in B. The format that I'm looking for is the following:

(A.id, B.id, distanceAB)

I have a function that calculates the distance given 2 pairs of latitude and longitude. I tried something using order by ... limit 1 and/or rank() over (partition by ...) as rowCount ... where rowCount = 1 but the result is either not really what I need or it takes too long to return.

Am I missing something?

like image 937
Marsellus Wallace Avatar asked Jan 20 '12 21:01

Marsellus Wallace


2 Answers

There's no way to get around the fact that you're going to have to compare every record in A with every record in B, which is obviously going to scale poorly if both A and B contain a lot of records.

That being said, this will return correct results:

SELECT aid, bid, distanceAB
FROM (
  SELECT aid, bid, distanceAB,
    dense_rank() over (partition by aid order by distanceAB) as n
  FROM (
    SELECT a.id as aid, B.id as bid,
      acos(sin(radians(A.lat)) * sin(radians(B.lat)) +
        cos(radians(A.lat)) * cos(radians(B.lat)) *
        cos(radians(A.lon - B.lon))) * 6372.8 as distanceAB
    FROM A cross join B
  ) C
) D
WHERE n = 1

This will return in a reasonable amount of time if your sets aren't too large. With 3 locations in A and 130,000 or so in B, it takes about one second on my machine. 1,000 records in each takes about 40s. Like I said, it scales poorly.

It should be noted that Sparky's answer can return incorrect results under certain circumstances. Suppose your A location is at +40,+100. +40,+111 would not be returned, even though it's closer than +49,+109.

like image 196
Chad Avatar answered Sep 22 '22 10:09

Chad


This is one approach that should have deceent performance, but a big caveat is that it might not find any results

    select top 1 a.id,b.id,dbo.yourFunction() as DistanceAB
    from a 
    join b on b.latitude between a.latitude-10 and a.latitude+10 and
              b.longititude between a.longitude-10 and b.longittude+10
    order by 3

What you are basically doing is looking for any B row within roughly a 20 unit radius of A and then sorting it by your function to determine the closest. You can adjust the unit radius as needed. While it is not exact, it should reduce the size of the result set and should give you decent performance results.

like image 39
Sparky Avatar answered Sep 20 '22 10:09

Sparky