Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Missing results due to geo proximity formula (store locator)

OK - I've been wrestling with this for about 3 months on and off and since I've exhausted every geo proximity formula out there that I've come across and I'm no closer to getting the right results I figured it time to ask for some help.

THE AIM

I'm setting up a fairly basic implementation of a store locator. The user enters their postcode and selects from a predefined list of search radii. The gmaps API generates lat/long coordinates for this address and passes them to a php script. In this script the user coords are queried against a mysql database table (structure below)

post_id int(11)                             
post_type varchar(20)                                
lat   float(10,6)                               
lng   float(10,6)

The results of this query (post ids) are entered into a wordpress query which generates the XML that contains the map marker data. (the wordpress query uses post__in and posts_per_page -1 to display info for all ID generated by the query

THE PROBLEM

In a nutshell, every implementation of the Haversine formula I've come across seems to result in missing markers - specifically any markers that are very close to the users entered coordinates (don't know precisely but I think it's within about 500m). This is a big problem as if the user enters their postcode and there is a store very close to their location it won't show up.

I've tried about 8 different permutations of the forumla that I've dug up from various tutorials with the same results. Below is the formula that I'm currently using on the site which provides all markers except for the those very close to the users entered position:

$center_lat = $_GET["lat"];
$center_lng = $_GET["lng"];
$radius = $_GET["radius"];

// Calculate square radius search

$lat1 = (float) $center_lat - ( (int) $radius / 69 );
$lat2 = (float) $center_lat + ( (int) $radius / 69 );
$lng1 = (float) $center_lng - (int) $radius / abs( cos( deg2rad( (float) $center_lat ) ) * 69 );
$lng2 = (float) $center_lng + (int) $radius / abs( cos( deg2rad( (float) $center_lat ) ) * 69 );

$sqlsquareradius = "
SELECT 
post_id, lat, lng
FROM
wp_geodatastore
WHERE
lat BETWEEN ".$lat1." AND ".$lat2."
AND
lng BETWEEN ".$lng1." AND ".$lng2."
"; // End $sqlsquareradius

// Create sql for circle radius check
$sqlcircleradius = "
SELECT
t.post_id,
3956 * 2 * ASIN(
    SQRT(
        POWER(
            SIN(
                ( ".(float) $center_lat." - abs(t.lat) ) * pi() / 180 / 2
            ), 2
        ) + COS(
            ".(float) $center_lat." * pi() / 180
        ) * COS(
            abs(t.lat) * pi() / 180
        ) * POWER(
            SIN(
                ( ".(float) $center_lng." - t.lng ) * pi() / 180 / 2
            ), 2
        )
    )
) AS distance
FROM
(".$sqlsquareradius.") AS t
HAVING
distance <= ".(int) $radius."
ORDER BY distance
"; // End $sqlcircleradius


$result = mysql_query($sqlcircleradius);

$row = mysql_fetch_array( $result );

while($row = mysql_fetch_array( $result )) {
// the contents of each row
$post_ids[] = $row['post_id'];
}

There was 1 formula that I tried that was suggested by Mike Pelley here: Geolocation SQL query not finding exact location

This formula seemed to show markers that were very close to the users entered location but missed out others that should have been displayed within the given radius. To clear up any confusion this is the code I used:

$center_lat = $_GET["lat"];
$center_lng = $_GET["lng"];
$radius = $_GET["radius"];

$sql = "
SELECT post_id, lat, lng, 
truncate((degrees(acos( sin(radians(lat)) 
* sin(radians(".$center_lat.")) 
+ cos(radians(lat)) 
* cos(radians(".$center_lat.")) 
* cos(radians(".$center_lng." - lng) ) ) ) 
* 69.09*1.6),1) as distance 
FROM wp_geodatastore HAVING distance <= ".$radius." ORDER BY distance desc
"; // End $sqlcircleradius


$result = mysql_query($sql);

$row = mysql_fetch_array( $result );

while($row = mysql_fetch_array( $result )) {
// Print out the contents of each row
$post_ids[] = $row['post_id'];
}

THE REQUEST

Basically I would like to know why neither of these blocks of code are displaying the correct markers. If anyone can suggest any improvements to the code or can point me towards some resource that I might have missed that would be great

EDIT

Thought my psudeo answer was working but as it turns out that was still having problems. I've ended up going for a very different tack now and I'm using a very good jquery store locator which can be found here: http://www.bjornblog.com/web/jquery-store-locator-plugin

Won't work for every project out there but for my needs it's perfect (and works!)

like image 569
FourStacks Avatar asked Jan 04 '12 13:01

FourStacks


1 Answers

EDIT This location-finder comes up often enough that I've written an article on it.

http://www.plumislandmedia.net/mysql/haversine-mysql-nearest-loc/

Original Post

Let's start by dealing with the haversine formula once for all, by putting it into a stored function so we can forget about its gnarly details. NOTE: This whole solution is in statute miles.

DELIMITER $$

CREATE
    FUNCTION distance(lat1 FLOAT, long1 FLOAT, lat2 FLOAT, long2 FLOAT)
    RETURNS FLOAT
    DETERMINISTIC NO SQL
    BEGIN
    RETURN (3959 * ACOS(COS(RADIANS(lat1)) 
                 * COS(RADIANS(lat2)) 
                 * COS(RADIANS(long1) - RADIANS(long2)) 
                 + SIN(RADIANS(lat1)) 
                 * SIN(RADIANS(lat2)) 
                )); 
    END$$

DELIMITER ;

Now let's put together a query that searches on the bounding box, and then refines the search with our distance function and orders by distance

Based on the PHP code in the your question:

Assume $radius is your radius, $center_lat, $center_lng is your reference point.

$sqlsquareradius = "
SELECT post_id, lat, lng
  FROM
(
    SELECT post_id, lat, lng,
           distance(lat, lng, " . $center_lat . "," . $center_lng . ") AS distance
      FROM wp_geodatastore
     WHERE lat >=  " . $center_lat . " -(" . $radius . "/69)
       AND lat <=  " . $center_lat . " +(" . $radius . "/69)
       AND lng >=  " . $center_lng . " -(" . $radius . "/69)
       AND lng <=  " . $center_lng . " +(" . $radius . "/69)
)a
WHERE distance <= " . $radius . "
ORDER BY distance
";

Notice a few things about this.

First, it does the bounding box computation in SQL rather than in PHP. There's no good reason for that, except keeping all the computation in one environment. (radius / 69) is the number of degrees in radius statute miles.

Second, it doesn't fiddle with the size of the longitudinal bounding box based on latitude. Instead it uses a simpler, but slightly too large, bounding box. This bounding box catches a few extra records, but the distance measurement gets rid of them. For your typical postcode / store finder app the performance difference is negligible. If you were searching many more records (e.g. a database of all utility poles) it might not be so trivial.

Third, it uses a nested query to do the distance elimination, to avoid having to run the distance function more than once for each item.

Fourth, it orders by distance ASCENDING. This means your zero-distance results should show up first in the result set. It usually makes sense to list nearest things first.

Fifth, it uses FLOAT rather than DOUBLE throughout. There's a good reason for that. The haversine distance formula is not perfect, because it makes the approximation that the earth is a perfect sphere. That approximation happens to break down at roughly the same level of accuracy as the epsilon for FLOAT numbers. So DOUBLE is deceptive numerical overkill for this problem. (Don't use this haversine formula to do civil engineering work like parking lot drainage, or you will get big puddles a couple of epsilon, a few inches, deep, I promise.) It's fine for store-finder applications.

Sixth, you are definitely going to want to create an index for your lat column. If your table of locations doesn't change very often, it will help to create an index for your lng column as well. But your lat index will give you most of your query performance gain.

Lastly, I tested the stored procedure and the SQL, but not the PHP.

Reference: http://www.scribd.com/doc/2569355/Geo-Distance-Search-with-MySQL Also my experience with a bunch of proximity finders for health care facilities.

--------------- EDIT --------------------

If you don't have a user interface that lets you define a stored procedure, that's a nuisance. At any rate, PHP lets you use numbered parameters in the sprintf call, so you can generate the whole nested statement like this. NOTE: You might need %$1f etc. You'll need to experiment with this.

$sql_stmt = sprintf ("
  SELECT post_id, lat, lng
    FROM
  (
    SELECT post_id, lat, lng,
           (3959 * ACOS(COS(RADIANS(lat)) 
                 * COS(RADIANS(%$1s)) 
                 * COS(RADIANS(lng) - RADIANS(%$2s)) 
                 + SIN(RADIANS(lat)) 
                 * SIN(RADIANS(%$1s)) 
            ))
           AS distance
      FROM wp_geodatastore
     WHERE lat >=  %$1s -(%$3s/69)
       AND lat <=  %$1s +(%$3s/69)
       AND lng >=  %$2s -(%$3s/69)
       AND lng <=  %$2s +(%$3s/69)
  )a
   WHERE distance <= %$3s
   ORDER BY distance
",$center_lat,$center_lng, $radius);
like image 169
O. Jones Avatar answered Nov 02 '22 01:11

O. Jones