Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use the SQL Server 2008 geography data type?

I am redesigning a customer database and one of the new pieces of information I would like to store along with the standard address fields (Street, City, etc.) is the geographic location of the address. The only use case I have in mind is to allow users to map the coordinates on Google maps when the address cannot otherwise be found, which often happens when the area is newly developed, or is in a remote/rural location.

My first inclination was to store latitude and longitude as decimal values, but then I remembered that SQL Server 2008 R2 has a geography data type. I have absolutely no experience using geography, and from my initial research, it looks to be overkill for my scenario.

For example, to work with latitude and longitude stored as decimal(7,4), I can do this:

insert into Geotest(Latitude, Longitude) values (47.6475, -122.1393) select Latitude, Longitude from Geotest 

but with geography, I would do this:

insert into Geotest(Geolocation) values (geography::Point(47.6475, -122.1393, 4326)) select Geolocation.Lat, Geolocation.Long from Geotest 

Although it's not that much more complicated, why add complexity if I don't have to?

Before I abandon the idea of using geography, is there anything I should consider? Would it be faster to search for a location using a spatial index vs. indexing the Latitude and Longitude fields? Are there advantages to using geography that I am not aware of? Or, on the flip side, are there caveats that I should know about which would discourage me from using geography?


Update

@Erik Philips brought up the ability to do proximity searches with geography, which is very cool.

On the other hand, a quick test is showing that a simple select to get the latitude and longitude is significantly slower when using geography (details below). , and a comment on the accepted answer to another SO question on geography has me leery:

@SaphuA You're welcome. As a sidenote be VERY carefull of using a spatial index on a nullable GEOGRAPHY datatype column. There are some serious performance issue, so make that GEOGRAPHY column non-nullable even if you have to remodel your schema. – Tomas Jun 18 at 11:18

All in all, weighing the likelihood of doing proximity searches vs. the trade-off in performance and complexity, I've decided to forgo the use of geography in this case.


Details of the test I ran:

I created two tables, one using geography and another using decimal(9,6) for latitude and longitude:

CREATE TABLE [dbo].[GeographyTest] (     [RowId] [int] IDENTITY(1,1) NOT NULL,     [Location] [geography] NOT NULL,     CONSTRAINT [PK_GeographyTest] PRIMARY KEY CLUSTERED ( [RowId] ASC ) )   CREATE TABLE [dbo].[LatLongTest] (     [RowId] [int] IDENTITY(1,1) NOT NULL,     [Latitude] [decimal](9, 6) NULL,     [Longitude] [decimal](9, 6) NULL,     CONSTRAINT [PK_LatLongTest] PRIMARY KEY CLUSTERED ([RowId] ASC) )  

and inserted a single row using the same latitude and longitude values into each table:

insert into GeographyTest(Location) values (geography::Point(47.6475, -122.1393, 4326)) insert into LatLongTest(Latitude, Longitude) values (47.6475, -122.1393) 

Finally, running the following code shows that, on my machine, selecting the latitude and longitude is approximately 5 times slower when using geography.

declare @lat float, @long float,         @d datetime2, @repCount int, @trialCount int,          @geographyDuration int, @latlongDuration int,         @trials int = 3, @reps int = 100000  create table #results  (     GeographyDuration int,     LatLongDuration int )  set @trialCount = 0  while @trialCount < @trials begin      set @repCount = 0     set @d = sysdatetime()      while @repCount < @reps     begin         select @lat = Location.Lat,  @long = Location.Long from GeographyTest where RowId = 1         set @repCount = @repCount + 1     end      set @geographyDuration = datediff(ms, @d, sysdatetime())      set @repCount = 0     set @d = sysdatetime()      while @repCount < @reps     begin         select @lat = Latitude,  @long = Longitude from LatLongTest where RowId = 1         set @repCount = @repCount + 1     end      set @latlongDuration = datediff(ms, @d, sysdatetime())      insert into #results values(@geographyDuration, @latlongDuration)      set @trialCount = @trialCount + 1  end  select *  from #results  select avg(GeographyDuration) as AvgGeographyDuration, avg(LatLongDuration) as AvgLatLongDuration from #results  drop table #results 

Results:

GeographyDuration LatLongDuration ----------------- --------------- 5146              1020 5143              1016 5169              1030  AvgGeographyDuration AvgLatLongDuration -------------------- ------------------ 5152                 1022 

What was more surprising is that even when no rows are selected, for example selecting where RowId = 2, which doesn't exist, geography was still slower:

GeographyDuration LatLongDuration ----------------- --------------- 1607              948 1610              946 1607              947  AvgGeographyDuration AvgLatLongDuration -------------------- ------------------ 1608                 947 
like image 464
Jeff Ogata Avatar asked Sep 13 '11 21:09

Jeff Ogata


People also ask

What is geography data type in SQL?

The geography spatial data type, geography, is implemented as a . NET common language runtime (CLR) data type in SQL Server. This type represents data in a round-earth coordinate system. The SQL Server geography data type stores ellipsoidal (round-earth) data, such as GPS latitude and longitude coordinates.

What is difference between geography and geometry in SQL Server?

Geometry : Stores data based on a flat (Euclidean) coordinate system. The data type is often used to store the X and Y coordinates that represent lines, points, and polygons in two-dimensional spaces. Geography : Stores data based on a round-earth coordinate system.

Why do we use data types in SQL?

Data types are used to represent the nature of the data that can be stored in the database table. For example, in a particular column of a table, if we want to store a string type of data then we will have to declare a string data type of this column.

What is spatial data type in SQL Server?

Spatial data represents information about the physical location and shape of geometric objects. These objects can be point locations or more complex objects such as countries, roads, or lakes. SQL Server supports two spatial data types: the geometry data type and the geography data type.


2 Answers

If you plan on doing any spatial computation, EF 5.0 allows LINQ Expressions like:

private Facility GetNearestFacilityToJobsite(DbGeography jobsite) {        var q1 = from f in context.Facilities                          let distance = f.Geocode.Distance(jobsite)              where distance < 500 * 1609.344                   orderby distance               select f;        return q1.FirstOrDefault(); } 

Then there is a very good reason to use Geography.

Explanation of spatial within Entity Framework.

Updated with Creating High Performance Spatial Databases

As I noted on Noel Abrahams Answer:

A note on space, each coordinate is stored as a double-precision floating-point number that is 64 bits (8 bytes) long, and 8-byte binary value is roughly equivalent to 15 digits of decimal precision, so comparing a decimal(9,6) which is only 5 bytes, isn't exactly a fair comparison. Decimal would have to be a minimum of Decimal(15,12) (9 bytes) for each LatLong (total of 18 bytes) for a real comparison.

So comparing storage types:

CREATE TABLE dbo.Geo (     geo geography ) GO  CREATE TABLE dbo.LatLng (         lat decimal(15, 12),        lng decimal(15, 12) ) GO  INSERT dbo.Geo SELECT geography::Point(12.3456789012345, 12.3456789012345, 4326)  UNION ALL SELECT geography::Point(87.6543210987654, 87.6543210987654, 4326)   GO 10000  INSERT dbo.LatLng SELECT  12.3456789012345, 12.3456789012345  UNION SELECT 87.6543210987654, 87.6543210987654  GO 10000  EXEC sp_spaceused 'dbo.Geo'  EXEC sp_spaceused 'dbo.LatLng' 

Result:

name    rows    data      Geo     20000   728 KB    LatLon  20000   560 KB 

The geography data-type takes up 30% more space.

Additionally the geography datatype is not limited to only storing a Point, you can also store LineString, CircularString, CompoundCurve, Polygon, CurvePolygon, GeometryCollection, MultiPoint, MultiLineString, and MultiPolygon and more. Any attempt to store even the simplest of Geography types (as Lat/Long) beyond a Point (for example LINESTRING(1 1, 2 2) instance) will incur additional rows for each point, a column for sequencing for the order of each point and another column for grouping of lines. SQL Server also has methods for the Geography data types which include calculating Area, Boundary, Length, Distances, and more.

It seems unwise to store Latitude and Longitude as Decimal in Sql Server.

Update 2

If you plan on doing any calculations like distance, area, etc, properly calculating these over the surface of the earth is difficult. Each Geography type stored in SQL Server is also stored with a Spatial Reference ID. These id's can be of different spheres (the earth is 4326). This means that the calculations in SQL Server will actually calculate correctly over the surface of the earth (instead of as-the-crow-flies which could be through the surface of the earth).

enter image description here

like image 180
Erik Philips Avatar answered Sep 22 '22 12:09

Erik Philips


Another thing to consider is the storage space taken up by each method. The geography type is stored as a VARBINARY(MAX). Try running this script:

CREATE TABLE dbo.Geo (     geo geography  )  GO  CREATE TABLE dbo.LatLon (     lat decimal(9, 6) ,   lon decimal(9, 6)  )  GO  INSERT dbo.Geo SELECT geography::Point(36.204824, 138.252924, 4326) UNION ALL SELECT geography::Point(51.5220066, -0.0717512, 4326)   GO 10000  INSERT dbo.LatLon SELECT  36.204824, 138.252924 UNION SELECT 51.5220066, -0.0717512  GO 10000  EXEC sp_spaceused 'dbo.Geo' EXEC sp_spaceused 'dbo.LatLon' 

Result:

name    rows    data      Geo     20000   728 KB    LatLon  20000   400 KB 

The geography data-type takes up almost twice as much space.

like image 36
Noel Abrahams Avatar answered Sep 20 '22 12:09

Noel Abrahams