I'm looking at using the amazon cloud for all my simulation needs. The resulting sim files are quite large, and I would like to move them over to my local drive for ease of analysis, ect. You have to pay per data you move over, so I want to compress all my sim soutions as small as possible. They are simply numpy arrays saved in the form of .mat files, using:
import scipy.io as sio
sio.savemat(filepath, do_compression = True)
So my question is, what is the best way to compress numpy arrays (they are currently stored in .mat files, but I could store them using any python method), by using python compression saving, linux compression, or both?
I am in the linux environment, and I am open to any kind of file compression.
It can compress binary data very efficiently. It stores arrays either on file or compressed in memory. Compression is based on blosc. See the scipy video for a bit of context.
What is data compression in machine learning? It uses an internal memory state to avoid the need to perform a one-to-one mapping of individual input symbols to distinct representations that use an integer number of bits, and it clears out the internal memory only after encoding the entire string of data symbols.
compress() in Python. The numpy. compress() function returns selected slices of an array along mentioned axis, that satisfies an axis.
NumPy uses much less memory to store data The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.
Unless you know something special about the arrays (e.g. sparseness, or some pattern) you aren't going to do much better than the default compression, and maybe gzip on top of that. In fact you may not even need to gzip the files if you're using HTTP for downloads and your server is configured to do compression. Good lossless compression algorithms rarely vary by more than 10%.
If savemat works as advertized you should be able to get gzip compression all in python with:
import scipy.io as sio
import gzip
f_out = gzip.open(filepath_dot_gz, 'wb')
sio.savemat(f_out, do_compression = True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With