EDIT: I have been using Postgres with PostGIS for a few months now, and I am satisfied.
I need to analyze a few million geocoded records, each of which will have latitude and longitude. These records include data of at least three different types, and I will be trying to see if each set influences the other.
What database is best for the underlying data store for all this data? Here's my desires:
I've already done some development using MySql, but I can change if necessary.
PostGIS is an open source, freely available spatial database extender for the PostgreSQL Database Management System (a.k.a DBMS). So PostgreSQL (a.k.a. Postgres) is THE database and PostGIS is like an add-on to that database. The latest release version of PostGIS now comes packaged with PostgreSQL.
Both PostgreSQL and MySQL are time-proven solutions that can compete with enterprise solutions such as Oracle and SQL Server. MySQL has been famous for its ease of use and speed, while PostgreSQL has many more advanced features, which is the reason that PostgreSQL is often described as an open-source version of Oracle.
The latest release version now comes packaged with the PostgreSQL DBMS installs as an optional add-on. As of this writing PostGIS 3.1. 4 is the latest stable release and PostGIS 3.2.
I have worked with all three databases and done migrations between them, so hopefully I can still add something to an old post. Ten years ago I was tasked with putting a largish -- 450 million spatial objects -- dataset from GML to a spatial database. I decided to try out MySQL and Postgis, at the time there was no spatial in SQL Server and we had a small startup atmosphere, so MySQL seemed a good fit. I subsequently was involved in MySQL, I attended/spoke at a couple of conferences and was heavily involved in the beta testing of the more GIS-compliant functions in MySQL that was finally released with version 5.5. I have subsequently been involved with migrating our spatial data to Postgis and our corporate data (with spatial elements) to SQL Server. These are my findings.
MySQL
1). Stability issues. Over the course of 5 years, we had several database corruptions issues, which could only be fixed by running myismachk on the index file, a process than can take well over 24 hours on a 450 million row table.
2). Until recently only MyISAM tables supported the spatial data type. This means if you want transaction support you are out of luck. InnoDB table type does now support spatial types, but not indexes on them, which given the typical sizes of spatial data sets, isn't terribly useful. See http://dev.mysql.com/doc/refman/5.0/en/innodb-restrictions.html My experience from going to conferences was that spatial was very much an afterthought -- we've implemented replication, partitioning, etc, but it doesn't work with spatial. EDIT: In the upcoming 5.7.5 release InnoDB will finally support indexes on spatial columns, meaning that ACID, foreign keys and spatial indexes will finally be available in the same engine.
3). The spatial functionality is extremely limited compared to both Postgis and SQL Server spatial. There is still no ST_Union function that acts on an entire geometry field, one of the queries I run most often, ie, you can't write:
select attribute, ST_Union(geom) from some_table group by some_attribute
which is very useful in a GIS context. Select ST_Union(geom1, const_geom) from some_table
, ie, one of the geometries is a hard-coded constant geometry is a bit limiting in comparison.
4). No support for rasters. Being able to do combined vector-raster analysis within a db is very useful GIS functionality.
5). No support for conversion from one spatial reference system to another.
6). Since acquisistion by Oracle, spatial has really been put on hold.
Overall, to be fair to MySQL it supported our website, WMS and general spatial processing for several years, and was easy to set up. On the downside, data corruption was an issue, and by being forced to use MyISAM tables you are giving up a lot of the benefits of an RDBMS.
Postgis
Given the issues we had with MySQL, we ultimately converted to Postgis. The key points of this experience have been.
1). Extreme stability. No data corruption in 5 years and we now have around 25 Postgres/GIS boxes on centos virtual machines, under varying degrees of load.
2). Rapid pace of development -- raster, topology, 3D support being recent examples of this.
3). Very active community. The Postgis irc channel and mailing list are excellent resources. The Postgis reference manual is also excellent. http://postgis.net/docs/manual-2.0/
4). Plays very well with other applications, under the OSGeo umbrella, such as GeoServer and GDAL.
5). Stored procedures can be written in many languages, apart from the default plpgsql, such as Python or R.
5). Postgres is a very standards compliant, fully featured RDBMS, which aims to stay close to the ANSI standards.
6). Support for window functions and recursive queries -- not in MySQL, but in SQL Server. This has made writing more complex spatial queries cleaner.
SQL Server.
I have only used SQL Server 2008 spatial functionality, and many of the annoyances of that release -- lack of support for conversions from one CRS to another, the need to add your own parameters to spatial indexes -- have now been resolved.
1). As spatial objects in SQL Server are basically CLR objects, the syntax feels backwards. Instead of ST_Area(geom) you write geom.STArea() and this becomes even more obvious when you chain functions together. The dropping of the underscore in function names is merely a minor annoyance.
2). I have had a number of invalid polygons that have been accepted by SQL Server, and the lack of a ST_MakeValid function can make this a bit painful.
3). Windows only. In general, Microsoft products (like ESRI ones) are designed to work very well with each other, but don't always have standard's compliance and interoperability as primary objectives. If you are running a windows only shop, this is not an issue.
UPDATE: having played a bit with SQL Server 2012, I can say that it has been improved significantly. There is now a good geometry validation function, there is good support for the Geography data type, including a FULL GLOBE object, which allows representing objects that occupy more than one hemisphere and support for Compound Curves and Circular Strings which is useful for accurate and compact representations of arcs (and circles) among other things. Transforming coordinates from one CRS to another still needs to be done in 3rd party libraries, though this is not a show stopper in most applications.
I haven't used SQL Server with large enough datasets to compare one on one with Postgis/MySQL, but from what I have seen the functions behave correctly, and while not quite as fully featured as Postgis, it is a huge improvement on MySQL's offerings.
Sorry for such a long answer, I hope some of the pain and joy I have suffered over the years might be of help to someone.
If you are interested in a thorough comparison, I recommend "Cross Compare SQL Server 2008 Spatial, PostgreSQL/PostGIS 1.3-1.4, MySQL 5-6" and/or "Compare SQL Server 2008 R2, Oracle 11G R2, PostgreSQL/PostGIS 1.5 Spatial Features" by Boston GIS.
Considering your points:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With