Compressing floating point data

Tags:

Are there any lossless compression methods that can be applied to floating point time-series data, and will significantly outperform, say, writing the data as binary into a file and running it through gzip?

Reduction of precision might be acceptable, but it must happen in a controlled way (i.e. I must be able to set a bound on how many digits must be kept)

I am working with some large data files which are series of correlated doubles, describing a function of time (i.e. the values are correlated). I don't generally need the full double precision but I might need more than float.

Since there are specialized lossless methods for images/audio, I was wondering if anything specialized exists for this situation.

Clarification: I am looking for existing practical tools rather than a paper describing how to implement something like this. Something comparable to gzip in speed would be excellent.

613

asked Dec 25 '11 17:12

Szabolcs

3 Answers

Here are some ideas if you want to create your own simple algorithm:

Use xor of the current value with the previous value to obtain a set of bits describing the difference.
Divide this difference into two parts: one part is the "mantissa bits" and one part is the "exponent bits".
Use variable length encoding (different number of bits/bytes per value), or any compression method you choose, to save these differences. You might use separate streams for mantissas and exponents, since mantissas have more bits to compress.
This may not work well if you are alternating between two different time-value streams sources. So you may have to compress each source into a separate stream or block.
To lose precision, you can drop the least significant bits or bytes from the mantissa, while leaving the exponent intact.

117

answered Oct 05 '22 03:10

Frank Hileman

You might want to have a look at these resources:

Lossless Compression of Predicted Floating-Point Values
Papers by Martin Burtscher: The FPC Double-Precision Floating-Point Compression Algorithm and its Implementation, Fast Lossless Compression of Scientific Floating-Point Data and High Throughput Compression of Double-Precision Floating-Point Data

You might also want to try Logluv-compressed TIFF for this, thought I haven't used them myself.

answered Oct 05 '22 04:10

Bobrovsky

Since you state that you need a precision somewhere between 'float' and 'double': you can zero out any number of least significant bits in single- and double-precision floats. IEEE-754 floating point numers are represented binary roughly like seeefffffffff, which represents the value

sign*1.fffffff*2^(eee).

You can zero out the least significant fraction (f) bits. For single-precision (32-bit) floats, there are 23 fraction bits of which you can zero out up to 22. For double-precision (64-bit), it's 52 and up to 51. (If you zero out all bits, then the special values NaN and +/-inf will be lost).

Especially if the data represents decimal values such as 1.2345, this will help in data compression. That is because 1.2345 cannot be represented exactly as a binary floating point value, but rather as 0x3ff3c083126e978d, which is not friendly to data compression. Chopping off the least significant 24 bits will result in 0x3ff3c08312000000, which is still accurate to about 9 decimal digits (in this example, the difference is 1.6e-9).

If you do this on the raw data, and then store the differences between subsequential numbers, it will be even more compression-friendly (via gzip) if the raw data varies slowly.

Here is an example in C:

#include <inttypes.h>

double double_trunc(double x, int zerobits)
{
  // mask is e.g. 0xffffffffffff0000 for zerobits==16
  uint64_t mask = -(1LL << zerobits);  
  uint64_t floatbits = (*((uint64_t*)(&x)));
  floatbits &= mask;
  x = * ((double*) (&floatbits));
  return x;
}

And one in python/numpy:

import numpy as np

def float_trunc(a, zerobits):
    """Set the least significant <zerobits> bits to zero in a numpy float32 or float64 array.
    Do this in-place. Also return the updated array.
    Maximum values of 'nzero': 51 for float64; 22 for float32.
    """

at = a.dtype
assert at == np.float64 or at == np.float32 or at == np.complex128 or at == np.complex64
if at == np.float64 or at == np.complex128:
    assert nzero <= 51
    mask = 0xffffffffffffffff - (1 << nzero) + 1
    bits = a.view(np.uint64)
    bits &= mask
elif at == np.float32 or at == np.complex64:
    assert nzero <= 22
    mask = 0xffffffff - (1 << nzero) + 1
    bits = a.view(np.uint32)
    bits &= mask

return a

answered Oct 05 '22 03:10

Han-Kwang Nienhuys

Related questions
                            
                                How to compare two tarball's content
                            
                                How to compress a buffer with zlib?
                            
                                Are there any downsides to using UPX to compress a Windows executable?
                            
                                How to zip multiple files using only .net api in c#
                            
                                7zip Ultra LZMA2 compression
                            
                                Implementing in-memory compression for objects in Java
                            
                                Reducing the file size of a very large images, without changing the image dimensions
                            
                                What is the best file compression of random binary data that you can achieve? [closed]
                            
                                Disable Asset Minification in Rails Production
                            
                                How to enable GZIP compression in IIS 7.5
                            
                                Before deployment, is there tool to compress HTML class attribute and CSS selectors?
                            
                                How does Git save space and is fast at the same time?
                            
                                How can I use deflated/gzipped content with an XHR onProgress function?
                            
                                Does AES (128 or 256) encryption expand the data? If so, by how much?
                            
                                Exclude debug JavaScript code during minification
                            
                                How to reduce the image size without losing quality in PHP [closed]
                            
                                Create normal zip file programmatically
                            
                                Tools to optimize (remove redundancy and merge) CSS? [closed]
                            
                                How does tinypng.org compress PNG files? [closed]
                            
                                Python script for minifying CSS? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Compressing floating point data

Tags:

floating-point

compression

time-series