Binary storage of floating point values (between 0 and 1) using less than 4 bytes?

Tags:

I need to store a massive numpy vector to disk. Right now the vector that I am trying to store is ~2.4 billion elements long and the data is float64. This takes about 18GB of space when serialized out to disk.

If I use struct.pack() and use float32 (4 bytes) I can reduce it to ~9GB. I don't need anywhere near this amount of precision disk space is going to quickly becomes an issue as I expect the number of values I need to store could grow by an order of magnitude or two.

I was thinking that if I could access the first 4 significant digits I could store those values in an int and only use 1 or 2 bytes of space. However, I have no idea how to do this efficiently. Does anyone have any idea or suggestions?

621

asked May 08 '15 18:05

Ryan Hope

1 Answers

If your data is between 0 and 1, and 16bit is enough you can save the data as uint16:

data16 = (65535 * data).round().astype(uint16)

and expand the data with

data = data16 / 65535.0

155

answered Oct 14 '22 02:10

Daniel

Related questions
                            
                                How to read HDF5 files that have only datasets (no groups) using h5py?
                            
                                Apply a Python function to an std::vector via Cython (callback)
                            
                                Extending threading.Timer for returning value from function gives TypeError
                            
                                Compressing request body with python-requests?
                            
                                Editing workbooks with rich text in openpyxl
                            
                                What is the best practice for storing UI messaging strings in Python/Django?
                            
                                Embedding multiple gridspec layouts on a single matplotlib figure?
                            
                                numpy sort acting weirdly when sorting on a pandas DataFrame
                            
                                Efficient data structure keeping objects sorted on multiple keys
                            
                                Running PEP8 checks from Python
                            
                                Python - safe & elegant way to set a variable from function that may return None
                            
                                Multiprocessing with Qt works in windows but not linux
                            
                                Split a Python string with nested separated symbol
                            
                                How to extrapolate curves in Python?
                            
                                Python : overflow error long int too large to convert to float
                            
                                Read Json with NaN into Python and Pandas
                            
                                Copy certain files from one folder to another using python
                            
                                timeseries fitted values from trend python
                            
                                How to check in linux shell encoding of string already generated by Python script
                            
                                Is there a way to plot a curve of best fit without function? Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Binary storage of floating point values (between 0 and 1) using less than 4 bytes?

Tags:

python

numpy

scipy

Ryan Hope

People also ask

1 Answers

Daniel

Recent Activity

Donate For Us