Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spatial queries on AWS SimpleDB

I would like to know what people suggest as efficient ways of doing a spatial query in an Amazon Web Services SimpleDB?

By spatial query I mean finding objects in a given radius of a latitude and longitude.

like image 819
user293895 Avatar asked Aug 07 '12 09:08

user293895


People also ask

What are the features of Amazon SimpleDB?

Amazon SimpleDB automatically manages infrastructure provisioning, hardware and software maintenance, replication and indexing of data items, and performance tuning. Highly available: Amazon SimpleDB automatically creates multiple geographically distributed copies of each data item you store.

Is AWS SimpleDB deprecated?

SimpleDB is deprecated, more expensive than DDB, and kind of weird to use. Backing your keystore with a deprecated service just sounds like a road to many sleepless nights ;) The utility does depend on three external services: DynamoDB, KMS, and IAM (for permissioning).

Which type of service is used for Amazon SimpleDB?

Amazon SimpleDB is a highly available NoSQL data store that offloads the work of database administration. Developers simply store and query data items via web services requests and Amazon SimpleDB does the rest.


1 Answers

SimpleDB doesn't currently offer any built-in spatial search operations but that doesn't mean it can't be done. There's several methods of implementing geospatial searches in non-geospatially aware databases such as SimpleDB and all of them center around the idea of using the database to retrieve a rough first selection based on a geospatial bounding box and then filtering the returned data in your application using more accurate algorithms such as the Haversine formula.

You could store the latitude and longitude as (zero-padded and normalized) numeric attributes and then perform a double range query (lat >= minLat and lat <= maxLat and lon >= minLat and lon <= maxLat) but since neither of theese predicates are selective (each predicate matches a lot of items) it's not ideal (see Tuning Queries).

A better way would be using GeoHashes.

Geohashes offer properties like arbitrary precision, similar prefixes for nearby positions, and the possibility of gradually removing characters from the end of the code to reduce its size (and gradually lose precision).

As a practical example, the Geohash 6gkzwgjzn820 decodes to the coordinates -25.382708 and -49.265506, while the Geohash 6gkzwgjz will decode to -25.383 and -49.266, and if we take a similar position in the same region, such as -25.427 and -49.315, we can see it being encoded as 6gkzmg1w (note the similar prefix).

From http://geohash.org/site/tips.html

With your item positions as GeoHashes you could use the like operator to search for a bounding box (where GeoHash like '6gkzmg1w%') but since the like operator is expensive (Comparison Operators) a better way would be to denormalize the data by storing each GeoHash prefix level (how many depends on your required search precision) as a separate attribute (GeoHash6 GeoHash8 etc) and then use a simple equality predicate (where Geohash8 = '6gkzmg1w').

Now on to the downside of GeoHashes. Since you can't make any assumption of a GeoHash being centered within your search box you have to search all neighboring prefixes as well. The process is excellently described by geohash-js

Geohash also has the property that as the number of digits decreases (from the right), accuracy degrades. This property can be used to do bounding box searches, as points near to one another will share similar Geohash prefixes.

However, because a given point may appear at the edge of a given Geohash bounding box, it is necessary to generate a list of Geohash values in order to perform a true proximity search around a point. Because the Geohash algorithm uses a base-32 numbering system, it is possible to derive the Geohash values surrounding any other given Geohash value using a simple lookup table.

So, for example, 1600 Pennsylvania Avenue, Washington DC resolves to: 38.897, -77.036

Using the geohash algorithm, this latitude and longitude is converted to: dqcjqcp84c6e

A simple bounding box around this point could be described by truncating this geohash to: dqcjqc

However, 'dqcjqcp84c6e' is not centered inside 'dqcjqc', and searching within 'dqcjqc' may miss some desired targets.

So instead, we can use the mathematical properties of the Geohash to quickly calculate the neighbors of 'dqcjqc'; we find that they are: 'dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4','dqcjr0','dqcjq8'

This gives us a bounding box around 'dqcjqcp84c6e' roughly 2km x 1.5km and allows for a database search on just 9 keys: SELECT * FROM table WHERE LEFT(geohash,6) IN ('dqcjqc', 'dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4','dqcjr0','dqcjq8');

Translated to a SimpleDB query that'd be where GeoHash6 in('dqcjqc', 'dqcjqf', 'dqcjqb', 'dqcjr1', 'dqcjq9', 'dqcjqd', 'dqcjr4', 'dqcjr0', 'dqcjq8') and then you'll do your Haversine filtering on the results in order to only get the items that's within your search radius.

like image 166
Markus Olsson Avatar answered Oct 17 '22 08:10

Markus Olsson