Reading a binary file with numpy structured array

Question

I am reading a binary file using the following method

numpy.fromfile(file, dtype=)

The binary file has multiple types present and i know the organization. Therefore I have defined a dtype array as follows:

dtypearr = [('a','i4',1),('b','S1',8),('c','i4',1),
 ('d','i4',1),('e','S1',8)]

This dtype array is saying that the first value of the binary file is one integer followed by 8 characters etc...

The problem i am having is that the binary file is not the size of dtypearr. The binary file has the structure defined in dtypearr repeating n times.

So far, what i have done is repeat the dtypearr with new field names until it is the same size as the binary file.

However, i was hoping that somehow, I could achieve this goal without repeating dtypearr. Instead I want an array to be stored in each field. For example, i want structuredarray['a'] or structuredarray['b'] to give me an array instead of a single value.

Edit

Note that:

numpy.fromfile(file, dtype=dtypearr)

Achieves what i want when the pattern is exactly the same. The solution below also works.

However, the pattern in the binary file i mentioned isn't exactly repeating. For example, there is a header portion and multiple subsections. And each subsection has its own repeating pattern. f.seek() will work for the last subsection, but not the subsections before.

JoshAdel · Accepted Answer

Try:

import numpy as np
import string

# Create some fake data
N = 10
dtype = np.dtype([('a', 'i4'), ('b', 'S8'), ('c', 'f8')])
a = np.zeros(N, dtype)
a['a'] = np.random.random_integers(0,3, N)
a['b'] = np.array([x for x in string.ascii_lowercase[:N]])
a['c'] = np.random.normal(size=(N,))

# Write to a binary file
a.tofile('test.dat')

# Read data into new array
b = np.fromfile('test.dat', dtype=dtype)

The arrays a and b are identical (i.e np.all(a['a'] == b['a']) is True):

for col in a.dtype.names:
    print col, np.all(a[col] == b[col])

# Prints:
# a True
# b True
# c True

Update:

If you have header information, you can first open the file, seek to the starting point of the data and then read. For example:

f = open("test.dat", "rb")
f.seek(header_size)
b = np.fromfile(f, dtype=dtype)
f.close()

You have to know the size (header_size), but then you should be good. If there are subsections, you can supply a count of the number of items to grab. I haven't tested if the counts works. If you are not bound to this binary format, I would recommend using something like hdf5 to store multiple arrays in a single file.

Reading a binary file with numpy structured array

Tags:

python

multidimensional-array

numpy

snowleopard

1 Answers

JoshAdel

Recent Activity

Donate For Us

Reading a binary file with numpy structured array

Tags:

python

multidimensional-array

numpy

snowleopard

1 Answers

JoshAdel

Related questions

Recent Activity

Donate For Us