I have a large database full of customers, implemented in sql server 2005. Customers each have a latitude and longitude, represented as Decimal(18,15)
. The most important search query in the database tries to find all customers close to a certain location like this:
(Addresses.Latitude - @SearchInLat) BETWEEN -1 * @LatitudeBound AND @LatitudeBound)
AND ( (Addresses.Longitude - @SearchInLng) BETWEEN -1 * @LongitudeBound AND @LongitudeBound)
So, this is a very simple method. @LatitudeBound
and @LongitudeBound
are just numbers, used to pull back all the customers within a rough bounding rectangle of the point @SearchInLat, @SearchInLng
. Once the results get to a client PC, some results are filtered out so that there is a bounding circle rather than a rectangle. (This is done on the client PC to avoid calculating square roots on the server.)
This method has worked well enough in the past. However, we now want to make the search do more interesting things - for instance, having the number of results pulled back be more predictable, or for the user to dynamically increase the size of the search radius. To do this, I have been looking at the possibility of ugprading to sql server 2008, with its Geography datatype, spatial indexes, and distance functions. My question is this: how fast are these?
The advantage of the simple query we have at the moment is that it is very fast and not performance intensive, which is important as it is called very often. How fast would a query based around something like this:
SearchInPoint.STDistance(Addresses.GeographicPoint) < @DistanceBound
be by comparison? Do the spatial indexes work well, and is STDistance fast?
The geography spatial data type, geography, is implemented as a . NET common language runtime (CLR) data type in SQL Server. This type represents data in a round-earth coordinate system. The SQL Server geography data type stores ellipsoidal (round-earth) data, such as GPS latitude and longitude coordinates.
SQL Server 2008's data compression enables you to compress data stored in the database. This reduces storage requirements and can actually improve the performance of workloads that have high I/O requirements. SQL Server 2008 also supports compressing backups.
Spatial data represents information about the physical location and shape of geometric objects. These objects can be point locations or more complex objects such as countries, roads, or lakes. SQL Server supports two spatial data types: the geometry data type and the geography data type.
A transect is a line following a route along which a survey or observations are made. The transect is an important geographic tool for studying changes in human and/or physical characteristics from one place to another.
If your handling just a standard Lat/Lng pair as you describe, and all your doing is a simple lookup, then arguably your not going to gain much in the way of a speed increase by using the Geometry Type.
However, if you do want to get more adventurous as you state, then swapping to using the Geometry types will open up a whole world of new possibilities for you, and not just for searches.
For example (Based on a project I'm working on) you could (If it's uk data) download the polygon definitions for all the towns / villages / city's for a given area, then do cross references to search in a particular town, or if you had a road map, you could find which customers lived next to major delivery routes, motorways, primary roads all sorts of things.
You could also do some very fancy reporting, imagine a map of towns, where each outline was plotted on a map, then shaded in with a colour to show density of customers in an area, some simple geometry SQL will easily return you a count straight from the database, to graph this kind of information.
Then there's tracking, I don't know what data you handle, or why you have customers, but if your delivering anything, feeding the co-ordinates of a delivery van in, tells you how close it is to a given customer.
As for the Question is STDistance fast? well that's difficult to say really, I think a better question is "Is it fast in comparison to.....", it's difficult to say yes or no, unless you have something to compare it to.
Spatial Indexes are one of the primary reasons for moving your data to geographically aware database they are optimised to produce the best results for a given task, but like any database, if you create bad indexes, then you will get bad performance.
In general you should definitely see a speed increase of some sort, because the maths in the sorting and indexing are more aware of the data's purpose as opposed to just being fairly linear in operation like a normal index is.
Bear in mind as well, that the more beefy the SQL server machine is, the better results you'll get.
One last point to mention is management of the data, if your using a GIS aware database, then that opens the avenue for you to use a GIS package such as ArcMap or MapInfo to manage, correct and visualise your data, meaning corrections are very easy to do by pointing, clicking and dragging.
My advice would be to create a side by side table to your existing one, that is formatted for spatial operations, then write a few stored procs and do some timing tests, see which comes out the best. If you have a significant increase just on the basic operations your doing, then that's justification alone, if it's about equal then your decision really hinges on, what new functionality you actually want to achieve.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With