Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a binary file with numpy structured array

I am reading a binary file using the following method

numpy.fromfile(file, dtype=)

The binary file has multiple types present and i know the organization. Therefore I have defined a dtype array as follows:

dtypearr = [('a','i4',1),('b','S1',8),('c','i4',1),
 ('d','i4',1),('e','S1',8)]

This dtype array is saying that the first value of the binary file is one integer followed by 8 characters etc...

The problem i am having is that the binary file is not the size of dtypearr. The binary file has the structure defined in dtypearr repeating n times.

So far, what i have done is repeat the dtypearr with new field names until it is the same size as the binary file.

However, i was hoping that somehow, I could achieve this goal without repeating dtypearr. Instead I want an array to be stored in each field. For example, i want structuredarray['a'] or structuredarray['b'] to give me an array instead of a single value.

Edit

Note that:

numpy.fromfile(file, dtype=dtypearr)

Achieves what i want when the pattern is exactly the same. The solution below also works.

However, the pattern in the binary file i mentioned isn't exactly repeating. For example, there is a header portion and multiple subsections. And each subsection has its own repeating pattern. f.seek() will work for the last subsection, but not the subsections before.

like image 905
snowleopard Avatar asked Mar 06 '26 02:03

snowleopard


1 Answers

Try:

import numpy as np
import string

# Create some fake data
N = 10
dtype = np.dtype([('a', 'i4'), ('b', 'S8'), ('c', 'f8')])
a = np.zeros(N, dtype)
a['a'] = np.random.random_integers(0,3, N)
a['b'] = np.array([x for x in string.ascii_lowercase[:N]])
a['c'] = np.random.normal(size=(N,))

# Write to a binary file
a.tofile('test.dat')

# Read data into new array
b = np.fromfile('test.dat', dtype=dtype)

The arrays a and b are identical (i.e np.all(a['a'] == b['a']) is True):

for col in a.dtype.names:
    print col, np.all(a[col] == b[col])

# Prints:
# a True
# b True
# c True

Update:

If you have header information, you can first open the file, seek to the starting point of the data and then read. For example:

f = open("test.dat", "rb")
f.seek(header_size)
b = np.fromfile(f, dtype=dtype)
f.close() 

You have to know the size (header_size), but then you should be good. If there are subsections, you can supply a count of the number of items to grab. I haven't tested if the counts works. If you are not bound to this binary format, I would recommend using something like hdf5 to store multiple arrays in a single file.

like image 69
JoshAdel Avatar answered Mar 07 '26 14:03

JoshAdel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!