I am reading a binary file using the following method
numpy.fromfile(file, dtype=)
The binary file has multiple types present and i know the organization. Therefore I have defined a dtype array as follows:
dtypearr = [('a','i4',1),('b','S1',8),('c','i4',1),
('d','i4',1),('e','S1',8)]
This dtype array is saying that the first value of the binary file is one integer followed by 8 characters etc...
The problem i am having is that the binary file is not the size of dtypearr. The binary file has the structure defined in dtypearr repeating n times.
So far, what i have done is repeat the dtypearr with new field names until it is the same size as the binary file.
However, i was hoping that somehow, I could achieve this goal without repeating dtypearr. Instead I want an array to be stored in each field. For example, i want structuredarray['a'] or structuredarray['b'] to give me an array instead of a single value.
Edit
Note that:
numpy.fromfile(file, dtype=dtypearr)
Achieves what i want when the pattern is exactly the same. The solution below also works.
However, the pattern in the binary file i mentioned isn't exactly repeating. For example, there is a header portion and multiple subsections. And each subsection has its own repeating pattern. f.seek() will work for the last subsection, but not the subsections before.
Try:
import numpy as np
import string
# Create some fake data
N = 10
dtype = np.dtype([('a', 'i4'), ('b', 'S8'), ('c', 'f8')])
a = np.zeros(N, dtype)
a['a'] = np.random.random_integers(0,3, N)
a['b'] = np.array([x for x in string.ascii_lowercase[:N]])
a['c'] = np.random.normal(size=(N,))
# Write to a binary file
a.tofile('test.dat')
# Read data into new array
b = np.fromfile('test.dat', dtype=dtype)
The arrays a and b are identical (i.e np.all(a['a'] == b['a']) is True):
for col in a.dtype.names:
print col, np.all(a[col] == b[col])
# Prints:
# a True
# b True
# c True
Update:
If you have header information, you can first open the file, seek to the starting point of the data and then read. For example:
f = open("test.dat", "rb")
f.seek(header_size)
b = np.fromfile(f, dtype=dtype)
f.close()
You have to know the size (header_size), but then you should be good. If there are subsections, you can supply a count of the number of items to grab. I haven't tested if the counts works. If you are not bound to this binary format, I would recommend using something like hdf5 to store multiple arrays in a single file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With