This might be a stupid question but I'm unable to find a proper answer to it. I want to store (don't ask why) a binary representation of a (2000, 2000, 2000)
array of zeros into disk, binary format. The traditional approach to achieve so would be:
with open('myfile', 'wb') as f:
f.write('\0' * 4 * 2000 * 2000 * 2000) # 4 bytes = float32
But that would imply creating a very large string which is not necessary at all. I know of two other options:
Iterate over the elements and store one byte at a time (extremely slow)
Create a numpy array and flush it to disk (as memory expensive as the string creation in the example above)
I was hopping to find something like write(char, ntimes)
(as it exists in C and other languages) to copy on disk char
ntimes
at C speed, and not at Python-loops speed, without having to create such a big array on memory.
I don't know why you are making such a fuss about "Python's loop speed", but writing in the way of
for i in range(2000 * 2000):
f.write('\0' * 4 * 2000) # 4 bytes = float32
will tell the OS to write 8000 0-bytes. After write
returns, in the next loop run it is called again.
It might be that the loop is executed slightly slower than it would be in C, but that definitely won't make a difference.
If it is ok to have a sparse file, you as well can seek to the desired file size's position and then truncate the file.
This would be a valid answer to fill a file from Python using numpy's memmap:
shape = (2000, 2000, 2000) # or just (2000 * 2000 * 2000,)
fp = np.memmap(filename, dtype='float32', mode='w+', shape=shape)
fp[...] = 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With