Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary storage of floating point values (between 0 and 1) using less than 4 bytes?

I need to store a massive numpy vector to disk. Right now the vector that I am trying to store is ~2.4 billion elements long and the data is float64. This takes about 18GB of space when serialized out to disk.

If I use struct.pack() and use float32 (4 bytes) I can reduce it to ~9GB. I don't need anywhere near this amount of precision disk space is going to quickly becomes an issue as I expect the number of values I need to store could grow by an order of magnitude or two.

I was thinking that if I could access the first 4 significant digits I could store those values in an int and only use 1 or 2 bytes of space. However, I have no idea how to do this efficiently. Does anyone have any idea or suggestions?

like image 621
Ryan Hope Avatar asked May 08 '15 18:05

Ryan Hope


People also ask

How is a float stored in 4 bytes?

Single-precision values with float type have 4 bytes, consisting of a sign bit, an 8-bit excess-127 binary exponent, and a 23-bit mantissa. The mantissa represents a number between 1.0 and 2.0. Since the high-order bit of the mantissa is always 1, it is not stored in the number.

Is a float always 4 bytes?

Yes it has 4 bytes only but it is not guaranteed.

How are floating point numbers stored in binary?

Scalars of type float are stored using four bytes (32-bits). The format used follows the IEEE-754 standard. The mantissa represents the actual binary digits of the floating-point number.


1 Answers

If your data is between 0 and 1, and 16bit is enough you can save the data as uint16:

data16 = (65535 * data).round().astype(uint16)

and expand the data with

data = data16 / 65535.0
like image 155
Daniel Avatar answered Oct 14 '22 02:10

Daniel