I'm trying to read a binary file (which represents a matrix in Matlab) in Python. But I am having trouble reading the file and converting the bytes to the correct values.
The binary file consists of a sequence of 4-byte numbers. The first two numbers are the number of rows and columns respectively. My friend gave me a Matlab function he wrote that does this using fwrite. I would like to do something like this:
f = open(filename, 'rb')
rows = f.read(4)
cols = f.read(4)
m = [[0 for c in cols] for r in rows]
r = c = 0
while True:
if c == cols:
r += 1
c = 0
num = f.read(4)
if num:
m[r][c] = num
c += 1
else:
break
But whenever I use f.read(4), I get something like '\x00\x00\x00\x04' (this specific example should represent a 4), and I can't figure out convert it into the correct number (using int, hex or anything like that doesn't work). I stumbled upon struct.unpack, but that didn't seem to help very much.
Here is an example matrix and the corresponding binary file (as it appears when I read the entire file using the python function f.read() without any size paramater) that the Matlab function created for it:
4 4 2 4
2 2 2 1
3 3 2 4
2 2 6 2
'\x00\x00\x00\x04\x00\x00\x00\x04@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\xc0\x00\x00@\x80\x00\x00?\x80\x00\x00@\x80\x00\x00@\x00\x00\x00'
So the first 4 bytes and the 5th-8th bytes should both be 4, as the matrix is 4x4. and then it should be 4,4,2,4,2,2,2,1,etc...
Thanks guys!
rows = f.read(4)
cols = f.read(4)
both names are now bound to 4-byte strings. To turn them into integers instead,
import struct
rowsandcols = f.read(8)
rows, cols = struct.unpack('=ii', rowsandcols)
See the docs for struct.unpack.
I looked a bit more in your problem, since I had never used struct before so it was good learning activity. Turns out there are couple of twists there - first the numbers are not stored as 4-byte integers but as 4-byte float in big-endian form. Second, if your example is correct, then the matrix was not stored as one would expect - by rows, but by columns instead. E.g. it was output like so (pseudocode):
for j in cols:
for i in rows:
write Aij to file
So I had to transpose the result after reading. Here is the code that you need given the example:
import struct
def readMatrix(f):
rows, cols = struct.unpack('>ii',f.read(8))
m = [ list(struct.unpack('>%df' % rows, f.read(4*rows)))
for c in range(cols)
]
# transpose result to return
return zip(*m)
And here we test it:
>>> from StringIO import StringIO
>>> f = StringIO('\x00\x00\x00\x04\x00\x00\x00\x04@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\xc0\x00\x00@\x80\x00\x00?\x80\x00\x00@\x80\x00\x00@\x00\x00\x00')
>>> mat = readMatrix(f)
>>> for row in mat:
... print row
...
(4.0, 4.0, 2.0, 4.0)
(2.0, 2.0, 2.0, 1.0)
(3.0, 3.0, 2.0, 4.0)
(2.0, 2.0, 6.0, 2.0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With