Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lossless Compression for Coordinate Path Data

I am brainstorming for a project which will store large chunks of coordinate data (latitude, longitude) in a database. Key aspects of this data will be calculated and stored, and then the bulk of the data will be compressed and stored. I am looking for a lossless compression algorithm to reduce the storage space of this data. Is there an (preferably common) algorithm which is good at compressing this type of data?

Known attributes of the data

  • The coordinate pairs are ordered and that order should be preserved.
  • All numbers will be limited to 5 decimal places (roughly 1m accuracy).
  • The coordinate pairs represent a path, and adjacent pairs will likely be relatively close to each other in value.

Example Data

[[0.12345, 34.56789], [0.01234, 34.56754], [-0.00012, 34.56784], …]

Note: I am not so concerned about language at this time, but I will potentially implement this in Javascript and PHP.

Thanks in advance!

like image 540
Nate Avatar asked Feb 15 '23 05:02

Nate


2 Answers

To expand on the delta encoding suggested by barak manos, you should start by encoding the coordinates as binary numbers instead of strings. Use four-byte signed integers, which each equal to 105 times your values.

Then apply delta encoding, where each latitude and longitude respectively are subtracted from the previous one. The first lat/long is left as is.

Now break the data into four planes, one for each of the four-bytes in the 32-bit integers. The higher bytes will be mostly zeros, with all of the entropy in the lower bytes. You can break the data into blocks, so that your planes don't have to span the entire data set.

Then apply zlib or lzma compression.

like image 178
Mark Adler Avatar answered Feb 24 '23 07:02

Mark Adler


I would recommend that you first exploit the fact that adjacent symbols are similar, and convert your data in order to reduce the entropy. Then, apply the compression algorithm of your choice on the output.

Let IN_ARR be the original array and OUT_ARR be the converted array (input for compression):

OUT_ARR[0] = IN_ARR[0]
for i = 1 to N-1
    OUT_ARR[i] = IN_ARR[i] - IN_ARR[i-1]

For simplicity, the pseudo-code above is written for 1-dimension coordinates.

But of course, you can easily implement it for 2-dimension coordinates...

And of course, you will have to apply the inverse operation after decompression:

IN_ARR[0] = OUT_ARR[0]
for i = 1 to N-1
    IN_ARR[i] = OUT_ARR[i] + IN_ARR[i-1]
like image 30
barak manos Avatar answered Feb 24 '23 08:02

barak manos