Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compress numpy arrays efficiently

Tags:

I tried various methods to do data compression when saving to disk some numpy arrays.

These 1D arrays contain sampled data at a certain sampling rate (can be sound recorded with a microphone, or any other measurment with any sensor) : the data is essentially continuous (in a mathematical sense ; of course after sampling it is now discrete data).

I tried with HDF5 (h5py) :

f.create_dataset("myarray1", myarray, compression="gzip", compression_opts=9) 

but this is quite slow, and the compression ratio is not the best we can expect.

I also tried with

numpy.savez_compressed() 

but once again it may not be the best compression algorithm for such data (described before).

What would you choose for better compression ratio on a numpy array, with such data ?

(I thought about things like lossless FLAC (initially designed for audio), but is there an easy way to apply such an algorithm on numpy data ?)

like image 948
Basj Avatar asked Mar 14 '14 09:03

Basj


People also ask

How do I make my NumPy array smaller?

If you're running into memory issues because your NumPy arrays are too large, one of the basic approaches to reducing memory usage is compression. By changing how you represent your data, you can reduce memory usage and shrink your array's footprint—often without changing the bulk of your code.

How do I compress a NumPy file?

Save several arrays into a single file in compressed . npz format. Provide arrays as keyword arguments to store them under the corresponding name in the output file: savez(fn, x=x, y=y) . If arrays are specified as positional arguments, i.e., savez(fn, x, y) , their names will be arr_0, arr_1, etc.

How do I zip a NumPy array?

NumPy Zip With the list(zip()) Function. If we have two 1D arrays and want to zip them together inside a 2D array, we can use the list(zip()) function in Python. This approach involves zipping the arrays together inside a list. The list(zip(a,b)) function takes the arrays a and b as an argument and returns a list.

Is NumPy concatenate efficient?

You can concatenate two or more 1d arrays using the vstack and hstack methods. concatenate() is more efficient than these methods.


1 Answers

What I do now:

import gzip import numpy  f = gzip.GzipFile("my_array.npy.gz", "w") numpy.save(file=f, arr=my_array) f.close() 
like image 91
Albert Avatar answered Sep 21 '22 11:09

Albert