I want to read a binary file in Python, the exact layout of which is stored in the binary file itself.
The file contains a sequence of two-dimensional arrays, with the row and column dimensions of each array stored as a pair of integers preceding its contents. I want to successively read all of the arrays contained within the file.
I know this can be done with f = open("myfile", "rb")
and f.read(numberofbytes)
, but this is quite clumsy because I would then need to convert the output into meaningful data structures. I would like to use numpy's np.fromfile
with a custom dtype
, but have not found a way to read part of the file, leaving it open, and then continue reading with a modified dtype
.
I know I can use os
to f.seek(numberofbytes, os.SEEK_SET)
and np.fromfile
multiple times, but this would mean a lot of unnecessary jumping around in the file.
In short, I want MATLAB's fread
(or at least something like C++ ifstream
read
).
What is the best way to do this?
NumPy Input and Output: fromfile() function The fromfile() function is used to construct an array from data in a text or binary file. Open file object or filename. Data type of the returned array. For binary files, it is used to determine the size and byte-order of the items in the file.
To read from a binary file, we need to open it with the mode rb instead of the default mode of rt : >>> with open("exercises. zip", mode="rb") as zip_file: ... contents = zip_file. read() ...
You can pass an open file object to np.fromfile
, read the dimensions of the first array, then read the array contents (again using np.fromfile
), and repeat the process for additional arrays within the same file.
For example:
import numpy as np
import os
def iter_arrays(fname, array_ndim=2, dim_dtype=np.int, array_dtype=np.double):
with open(fname, 'rb') as f:
fsize = os.fstat(f.fileno()).st_size
# while we haven't yet reached the end of the file...
while f.tell() < fsize:
# get the dimensions for this array
dims = np.fromfile(f, dim_dtype, array_ndim)
# get the array contents
yield np.fromfile(f, array_dtype, np.prod(dims)).reshape(dims)
Example usage:
# write some random arrays to an example binary file
x = np.random.randn(100, 200)
y = np.random.randn(300, 400)
with open('/tmp/testbin', 'wb') as f:
np.array(x.shape).tofile(f)
x.tofile(f)
np.array(y.shape).tofile(f)
y.tofile(f)
# read the contents back
x1, y1 = iter_arrays('/tmp/testbin')
# check that they match the input arrays
assert np.allclose(x, x1) and np.allclose(y, y1)
If the arrays are large, you might consider using np.memmap
with the offset=
parameter in place of np.fromfile
to get the contents of the arrays as memory-maps rather than loading them into RAM.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With