Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Product Lookup By Postal/Zip Code | Haversine Algorithm | Performance

I have an application that searches for items based on a postal code.

When searching for the postal code, I return all products that are from that City/Neighborhood (done by parsing the postal/zip codes).

I now need to sort these products based on distance from the original postal/zip code.

I have the Lat/Long stored on the DB and plan to use the Haversine formula to calculate an apprx distance from the original query.

My question is, where should this be calculated. Should I do this in a stored procedure, before returning my data set?

Or should I return my data set, with my Lat/Long, and calculate it server side before returning to user.

The calculation may need to be performed for up to 1000 results.

like image 944
Mark Avatar asked Sep 26 '22 11:09

Mark


2 Answers

Typically, DB servers are IO-bound rather than CPU-bound. YMMV, but if your case is typical it would be desirable to perform the Haversine calculation on the DB server.

I would recommend using a customized lookup table for your arcsine calculations, as you can probably provide approximate distances on a logarithmic scale such as:

  • 100m,

  • 300m,

  • 1km,

  • 3km,

  • 10km,

  • 30km,

  • > 30km

    and then use linear interpolation as a refinement.

For the typical distances encountered with a single metropolitan area, you might consider using just 2 or 3 terms of the Taylor expansion for sin and cos rather than more exact calculations:

  • sin(x) =~ x - x^3 / 6 + x^5 / 120
  • cos(x) =~ 1 - x^2 / 2 + x^4 / 24

Recall also that for a converging Taylor Series, the error after the n'th term is strictly less than the magnitude of the (n+1)'st term. This allows you to efficiently terminate a calculation once a desired accuracy has been achieved, which in general for the Haversine formula is only 0.5% due to the Earth not being a uniform sphere.

like image 66
Pieter Geerkens Avatar answered Sep 28 '22 06:09

Pieter Geerkens


Are you using a version of SQL Server 2008 or above? If so, I recommend using the build-in geography data type rather than doing the Haversine calculation directly. You could have a table of zip codes that has the zip code (e.g. 90210) and either the central point of the zip code or the entire area covered by the zip code in another column (or both if that makes sense for your application). Then, you can use the STDistance() function to calculate the distance. Also, with spatial indexing, you can get a list ranked by distance without too much effort.

like image 42
Ben Thul Avatar answered Sep 28 '22 04:09

Ben Thul