I have a large array with a range of integers that are mostly continuous, eg 1-100, 110-160, etc. All integers are positive. What would be the best algorithm to compress this?
I tried the deflate algorithm but that gives me only 50% compression. Note that the algorithm cannot be lossy.
All numbers are unique and progressively increasing.
Also if you can point me to the java implementation of such algorithm that would be great.
The fastest algorithm, lz4, results in lower compression ratios; xz, which has the highest compression ratio, suffers from a slow compression speed. However, Zstandard, at the default setting, shows substantial improvements in both compression speed and decompression speed, while compressing at the same ratio as zlib.
PFOR-DELTA (delta encoding on top of PFOR) – In this method, the integers are made smaller by considering the differences between subsequent values.
Lempel–Ziv–Welch (LZW) is a lossless compression algorithm developed in 1984. It is used in the GIF format, introduced in 1987. DEFLATE, a lossless compression algorithm specified in 1996, is used in the Portable Network Graphics (PNG) format.
We have written recent research papers that survey the best schemes for this problem. Please see:
Daniel Lemire and Leonid Boytsov, Decoding billions of integers per second through vectorization,Software: Practice & Experience 45 (1), 2015. http://arxiv.org/abs/1209.2137
Daniel Lemire, Nathan Kurz, Leonid Boytsov, SIMD Compression and the Intersection of Sorted Integers, Software: Practice and Experience (to appear) http://arxiv.org/abs/1401.6399
They include an extensive experimental evaluation.
You can find a complete implementation of all techniques in C++11 online: https://github.com/lemire/FastPFor and https://github.com/lemire/SIMDCompressionAndIntersection
There are also C libraries: https://github.com/lemire/simdcomp and https://github.com/lemire/MaskedVByte
If you prefer Java, please see https://github.com/lemire/JavaFastPFOR
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With