Lossless Compression for Coordinate Path Data

Question

I am brainstorming for a project which will store large chunks of coordinate data (latitude, longitude) in a database. Key aspects of this data will be calculated and stored, and then the bulk of the data will be compressed and stored. I am looking for a lossless compression algorithm to reduce the storage space of this data. Is there an (preferably common) algorithm which is good at compressing this type of data?

Known attributes of the data

The coordinate pairs are ordered and that order should be preserved.
All numbers will be limited to 5 decimal places (roughly 1m accuracy).
The coordinate pairs represent a path, and adjacent pairs will likely be relatively close to each other in value.

Example Data

[[0.12345, 34.56789], [0.01234, 34.56754], [-0.00012, 34.56784], …]

Note: I am not so concerned about language at this time, but I will potentially implement this in Javascript and PHP.

Thanks in advance!

Mark Adler · Accepted Answer

To expand on the delta encoding suggested by barak manos, you should start by encoding the coordinates as binary numbers instead of strings. Use four-byte signed integers, which each equal to 10⁵ times your values.

Then apply delta encoding, where each latitude and longitude respectively are subtracted from the previous one. The first lat/long is left as is.

Now break the data into four planes, one for each of the four-bytes in the 32-bit integers. The higher bytes will be mostly zeros, with all of the entropy in the lower bytes. You can break the data into blocks, so that your planes don't have to span the entire data set.

Then apply zlib or lzma compression.

barak manos · Answer

I would recommend that you first exploit the fact that adjacent symbols are similar, and convert your data in order to reduce the entropy. Then, apply the compression algorithm of your choice on the output.

Let IN_ARR be the original array and OUT_ARR be the converted array (input for compression):

OUT_ARR[0] = IN_ARR[0]
for i = 1 to N-1
    OUT_ARR[i] = IN_ARR[i] - IN_ARR[i-1]

For simplicity, the pseudo-code above is written for 1-dimension coordinates.

But of course, you can easily implement it for 2-dimension coordinates...

And of course, you will have to apply the inverse operation after decompression:

IN_ARR[0] = OUT_ARR[0]
for i = 1 to N-1
    IN_ARR[i] = OUT_ARR[i] + IN_ARR[i-1]

Lossless Compression for Coordinate Path Data

Tags:

algorithm

compression

coordinates

Known attributes of the data

Example Data

Nate

2 Answers

Mark Adler

barak manos

Recent Activity

Donate For Us

Lossless Compression for Coordinate Path Data

Tags:

algorithm

compression

coordinates

Known attributes of the data

Example Data

Nate

2 Answers

Mark Adler

barak manos

Related questions

Recent Activity

Donate For Us