Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best way to query a database for records within n miles of a zip code?

I have a list of records in my database and each record is associated with a zip code.

What is the "best-practice" for querying all the records in my database to find all entries that are within n miles of another zip code?

Each zip code has a lat/long associated with it in the database so I know I'll have to use that. However, I can't imagine running any sort of distance formula on each pair of zip codes, converting to miles and rejecting those that aren't within my radius.

That seems awfully computationally expensive for such a common query.

I've also considered doing an all-pairs pre-computation but it seems too large to consider also. There are approximately ~40,000 zip codes in the US. So, an all pairs database of each zip code would be (40,000)^2, or 1.6billion entries.

I know this is a common problem on websites so hopefully someone can point me in the right direction for the best way. I'm using SQL Server 2008 and if there are pre-built solutions out there then great, because I really don't want to re-invent the wheel in this instance.


Related Question: Getting all zip codes within radius (this didn't help me)
Also, I know of this SourceForge project but it is derelict and no longer in use.

like image 705
mmcdole Avatar asked Feb 09 '09 09:02

mmcdole


2 Answers

I would run a query that returned all records bracketed in the square envelope encompasing the radial search circle (minlat < lat < maxlat and minlong < long < maxlong), and then post-process this to return only the points within the radius circle itself. (Make sure your lat and long fields are indexed).

If you wanted to get fancy, SQL server supports spatial indexes.

like image 169
mackenir Avatar answered Nov 15 '22 12:11

mackenir


I run a site that needs to run this query about once per second per user, and here's what I've learned:

First off, make sure your location table has indexes on Lat and Lon. That's the difference between 20ms and 15s response times if you have millions of records.

Start off with a bounding-box query to get a set of locations to work with. Then calculate distances on those, sort, and if you're fussy about accuracy, filter a few out.

Frankly, I wouldn't worry about pre-computing anything. Like I say, I run this type of query against a location table with 6,000,000 entries, and it usually returns results in <50ms. Depending on your needs, that really aught to be fast enough.

Good luck!

like image 38
Jason Kester Avatar answered Nov 15 '22 12:11

Jason Kester