Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regarding storing Lat / Lng coordinates in Postgresql (Column type)

I am relatively new to postgresql which is why I turn to those more experienced with it than I.

I am storing coordinates in a postgresql database.

They look like this: 35.21076593772987,11.22855348629825 35.210780222605616,11.22826420209139 35.210777635062875,11.228241328291957 35.210766843596794,11.228219799676775 35.210765045075604,11.228213072050166 35.21076234732945,11.228200962345223 35.21076324691649,11.228186161764323 35.21077314123606,11.228083902231146 35.210863083636866,11.227228492401766

They can range in length from around 800 characters up to 7000.

They always include:

  • Numbers (0-9)
  • Spaces ( )
  • Punctuation marks and commas (. ,)

But they can also include:

  • Vertical bars ( | )

Right now I am storing them as TEXT, but to my understanding TEXT is stored externally which has an effect on performance. Would you recommend switching to another column type? If so, which one?

Thank you very much.

like image 596
alexisdevarennes Avatar asked Jan 26 '14 19:01

alexisdevarennes


People also ask

What is the data type for latitude and longitude in PostgreSQL?

POINT Data Type With the combination of a latitude and a longitude coordinate, it is possible to locate any location on the map. In PostgreSQL, a point is a geometry data type that stores the data in the geometry data type.

Which of the following is the best data type to store latitude and longitude coordinates that are in decimal degrees?

Use DECIMAL(8,6) for latitude (90 to -90 degrees) and DECIMAL(9,6) for longitude (180 to -180 degrees). 6 decimal places is fine for most applications. Both should be "signed" to allow for negative values. DECIMAL type is intended for financial calculations where no floor/ceil is accepted.

How latitude and longitude are stored in the database?

Storing Latitude & Longitude data as Floats or Decimal This is one of the most fundamental ways of storing geocoordinate data. Latitude & longitude values can be represented & stored in a SQL database using decimal points (Decimal degrees) rather than degrees (or Degrees Minutes Seconds).

What data type is GPS coordinates?

Spatial Types - geography The SQL Server geography data type stores ellipsoidal (round-earth) data, such as GPS latitude and longitude coordinates.


3 Answers

Why not use PostGIS for this?

You're overlooking what's possibly the ideal storage for this kind of data - PostGIS's data types, particularly the geography type.

SELECT ST_GeogFromText('POINT(35.21076593772987 11.22855348629825)');

By using geography you're storing your data in a representative type that supports all sorts of powerful operations and indexes on the type. Of course, that's only one point; I strongly suspect your data is actually a line or a shape in which case you should use the appropriate PostGIS geography constructor and input format.

The big advantage to using geography is that it's a type designed specifically for asking real world questions about things like distance, "within", etc; you can use things like ST_Distance_Spheroid to get real earth-distance between points.

Avoiding PostGIS?

If you want to avoid PostGIS, and just store it with native types, I'd recommend an array of point:

postgres=> SELECT ARRAY[
     point('35.21076593772987','11.22855348629825'), 
     point('35.210780222605616','11.22826420209139'), 
     point('35.210777635062875','11.228241328291957') 
];
                                                       array                                                        
--------------------------------------------------------------------------------------------------------------------
 {"(35.2107659377299,11.2285534862982)","(35.2107802226056,11.2282642020914)","(35.2107776350629,11.228241328292)"}
(1 row)

... unless your points actually represent a line or shape in which case, use the appropriate type - path or polygon respectively.

This remains a useful compact representation - much more so than text in fact - that is still easily worked with within the DB.

Compare storage:

CREATE TABLE points_text AS SELECT '35.21076593772987,11.22855348629825 35.210780222605616,11.22826420209139 35.210777635062875,11.228241328291957 35.210766843596794,11.228219799676775 35.210765045075604,11.228213072050166 35.21076234732945,11.228200962345223 35.21076324691649,11.228186161764323 35.21077314123606,11.228083902231146 35.210863083636866,11.227228492401766'::text AS p

postgres=> SELECT pg_column_size(points_text.p) FROM points_text;
 pg_column_size 
----------------
            339
(1 row)

CREATE TABLE points_array AS
SELECT array_agg(point(px)) AS p from points_text, LATERAL regexp_split_to_table(p, ' ') split(px);

postgres=> SELECT pg_column_size(p) FROM points_array;
 pg_column_size 
----------------
            168
(1 row)

path is even more compact, and probably a truer way to model what your data really is:

postgres=> SELECT pg_column_size(path('35.21076593772987,11.22855348629825 35.210780222605616,11.22826420209139 35.210777635062875,11.228241328291957 35.210766843596794,11.228219799676775 35.210765045075604,11.228213072050166 35.21076234732945,11.228200962345223 35.21076324691649,11.228186161764323 35.21077314123606,11.228083902231146 35.210863083636866,11.227228492401766'));
 pg_column_size 
----------------
             96
(1 row)

unless it's a closed shape, in which case use polygon.

Don't...

Either way, please don't just model this as text. It'll make you cry later, when you're trying to solve problems like "how do I determine if this point falls within x distance of the path in this column". PostGIS makes this sort of thing easy, but only if you store your data sensibly in the first place.

See this closely related question, which discusses the good reasons not to just shove stuff in text fields.

Also don't worry too much about in-line vs out-of-line storage. There isn't tons you can do about it, and it's something you should be dealing with only once you get the semantics of your data model right.

like image 93
Craig Ringer Avatar answered Sep 21 '22 12:09

Craig Ringer


All of the character types (TEXT, VARCHAR, CHAR) behave similarly from a performance point of view. They are normally stored in-line in the table row, unless they are very large, in which case they may be stored in a separate file (called a TOAST file).

The reasons for this are:

  1. Table rows have to be able to fit inside the database page size (8kb by default)

  2. Having a very large field in a row stored inline would make it slower to access other fields in the table. Imagine a table which contains two columns - a filename and the file content - and you wanted to locate a particular file. If you had the file content stored inline, then you would have to scan every file to find the one you wanted. (Ignoring the effect of indexes that might exist for this example).

Details of TOAST storage can be found here. Note that out of line storage is not the only strategy - the data may be compressed and/or stored out of line.

TOAST-ing kicks in when a row exceeds a threshold (2kb by default), so it is likely that your rows will be affected by this since you state they can be up to 7000 chars (although it might be that most of them are only compressed, not stored out of line).

You can affect how tables are subjected to this treatment using the command ALTER TABLE ... SET STORAGE.

This storage strategy applies to all of the data types which you might use to store the type of data you are describing. It would take a better knowledge of your application to make reliable suggestions for other strategies, but here are some ideas:

  • It might be better to re-factor the data - instead of storing all of the co-ordinates into a large string and processing it in your application, store them as individual rows in a referenced table. Since in any case your application is splitting and parsing the data into co-ordinate pairs for use, letting the database do this for you makes a kind of sense.

    This would particularly be a good idea if subsets of the data in each co-ordinate set need to be selected or updated instead of always consumed or updated in a single operation, or if doing so allowed you to index the data more effectively.

  • Since we are talking about co-ordinate data, you could consider using PostGIS, an extension for PostgreSQL which specifically caters for this kind of data. It also includes operators allowing you to filter rows which are, for example, inside or outside bounding boxes.

like image 24
harmic Avatar answered Sep 21 '22 12:09

harmic


Don't focus on the fact that these numbers are coordinates. Instead, notice that they are strings of numbers in a very limited range, and all of roughly the same magnitude. You are most likely interested in how these numbers change (looks like a trajectory of an object off the coast of Tunisia if I just punch these coordinates into a map).

I would recommend that you convert the numbers to double precision (53 bits of precision ~ 9 parts in 10^15 - close to the LSD of your numbers), and subtract each value from the first value in the series. This will result in much smaller numbers being stored, and greater relative accuracy. You could get away with storing the differences as long integers, probably (multiplying appropriately) but it will be faster to keep them as doubles.

And if you just take each 'trajectory' (I am just calling a collection of GPS points a trajectory, I have no idea if that is what they represent in your case) and give it a unique ID, then you can have a table with columns:

unique ID  |  trajectory ID  |     latitude      |      longitude
   1              1            11.2285534862982     35.2107802226056
   2              1            11.2282642020913     35.2107776350628
   3              1            11.2282413282919     35.2107668435967
   4              1            11.2282197996767     35.2107650450756
   5              1            11.2282130720501     35.2107623473294
   6              1            11.2282009623452     35.2107632469164
   7              1            11.2281861617643     35.2107731412360
   8              1            11.2280839022311     35.2108630836368

Conversion from text to string is MUCH slower than you think - it requires many operations. If you end up using the data as numbers, I highly recommend storing them as numbers...

like image 23
Floris Avatar answered Sep 19 '22 12:09

Floris