Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL Postgresql / PostGIS

I have lat/lon coordinates in a 400 million rows partitioned mysql table. The table grows @ 2000 records a minute and old data is flushed every few weeks. I am exploring ways to do spatial analysis of this data as it comes in.

Most of the analysis requires finding whether a point is in a particular lat/lon polygon or which polygons contain that point.

I see the following ways of tackling the point in polygon (PIP) problem:

  1. Create a mysql function that takes a point and a Geometry and returns a boolean. Simple but not sure how Geometry can be used to perform operations on lat/lon co-ordinates since Geometry assumes flat surfaces and not spheres.

  2. Create a mysql function that takes a point and identifier of a custom data structure and returns a boolean. The polygon vertices can be stored in a table and a function can compute PIP using spherical math. Large number of polygon points may lead to a huge table and slow queries.

  3. Leave point data in mysql and store polygon data in PostGIS and use the app server to run PIP query in PostGIS by probviding point as a parameter.

  4. Port the application from MySQL to Postgresql/PostGIS. This will require a lot of effort in rewriting queries and procedures. I can still do it but how good is Postgresql at handling 400 million rows. A quick search on google for "mysql 1 billion rows" returns many results. same query for Postgres returns no relevant results.

Would like to hear some thoughts & suggestions.

like image 935
Dojo Avatar asked Mar 14 '12 04:03

Dojo


People also ask

What is PostGIS in PostgreSQL?

PostGIS is a spatial database extender for PostgreSQL object-relational database. It adds support for geographic objects allowing location queries to be run in SQL. SELECT superhero. name FROM city, superhero WHERE ST_Contains (city. geom, superhero. geom) AND city. name = 'Gotham'; Docs for latest stable release

What are PostgreSQL and MySQL used for?

PostgreSQL and MySQL are relational databases that organize data into tables. These tables can be linked — or related — based on data that is common to each. Relational databases enable your business to better understand the relationships among available data and help gain new insights for making better decisions or identifying new opportunities.

What is PostgreSQL geo-spatial?

It turns our vanilla flavored postgres into a spatial database and includes spatial datatypes (geometry, geography), spatial indexes (r-tree, quad-tree, kd-tree), and spatial functions. Working with GIS normally requires several layers of technology of Geo-Spatial Software, as for example:

What is IBM cloud databases for PostgreSQL?

IBM Cloud Databases for PostgreSQL is a fully managed database offering that takes the heavy lifting out of database management, letting developers get back to creating new, innovative products. IBM’s breadth is in open-source databases, and they actively engage with the large community of developers that support them.


2 Answers

A few thoughts.

First PostgreSQL and MySQL are completely different beasts when it comes to performance tuning. So if you go the porting route be prepared to rethink your indexing strategies. Not only does PostgreSQL have a far more flexible indexing than MySQL, but the table approaches are very different also, meaning the appropriate indexing strategies are as different as the tactics are. Unfortunately this means you can expect to struggle a bit. If i could give advice I would suggest dropping all non-key indexes at first and then adding them back sparingly as needed.

The second point is that nobody here can likely give you a huge amount of practical advice at this point because we don't know the internals of your program. In PostgreSQL, you are best off indexing only what you need, but you can index functions' outputs (which is really helpful in cases like this) and you can index only part of a table.

I am more a PostgreSQL guy than a MySQL guy so of course I think you should go with PostgreSQL. However rather than tell you why etc. and have you struggle at this scale, I will tell you a few things that I would look at using if I were trying to do this.

  • Functional indexes
  • Write my own functions for indexes for related analysis
  • PostGIS is pretty amazing and very flexible

In the end, switching db's at this volume is going to be a learning curve, and you need to be prepared for that. However, PostgreSQL can handle the volume just fine.

like image 185
Chris Travers Avatar answered Oct 26 '22 15:10

Chris Travers


The number of rows is quite irrelevant here. The question is how much of the point in polygon work that can be done by the index.

The answer to that depends on how big the polygons are.

PostGIS is very fast to find all points in the bounding box of a polygon. Then it takes more effort to find out if the point actually is inside the polygon.

If your polygons is small (small bounding boxes) the query will be efficient. If your polygons are big or have a shape that mekes the bounding box big then it will be less efficient.

If your polygons is more or less static there is work arounds. You can divide your polygons in smaller polygons and recreate the idnex. Then the index will be more efficient.

If your polygons is actually multipolygons the firs step is to split the multipolygons to polygons with ST_Dump and recreate and build an index on the result.

HTH

Nicklas

like image 34
Nicklas Avén Avatar answered Oct 26 '22 17:10

Nicklas Avén