I'm using numpy and Python 2.7 to compute large (100 million+ elements) boolean arrays for a super-massive prime sieve and write them to binary files to read at a much later time. NumPy bools are 8-bit, so the file size that I'm writing is much larger than necessary. Since I'm writing a large number of these files I'd like to keep them as small as humanly possible without having to waste a lot of time/memory converting them to a bitarray and back.
I was originally going to switch to using the bitarray module to keep file size down, but the sieve computation time increased by around 400% with the same algorithms, which is a bit unacceptable. Is there a fast-ish way to write and read back the ndarray in a smaller file, or is this a trade-off that I'm just going to have to deal with?
numpy.packbits to turn it into a uint8 array for writing, then numpy.unpackbits after reading it back. numpy.packbits pads the axis you're packing along with zeros to get to a multiple of 8, so make sure you keep track of how many zeros you'll need to chop off the end when you unpack the array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With