Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing binary data to a file in Python

I am trying to write data (text, floating point data) to a file in binary, which is to be read by another program later. The problem is that this program (in Fort95) is incredibly particular; each byte has to be in exactly the right place in order for the file to be read correctly. I've tried using Bytes objects and .encode() to write, but haven't had much luck (I can tell from the file size that it is writing extra bytes of data). Some code I've tried:

mgcnmbr='42'
bts=bytes(mgcnmbr)
test_file=open(PATH_HERE/test_file.dat','ab')
test_file.write(bts)
test_file.close()

I've also tried:

mgcnmbr='42'
bts=mgcnmbr.encode(utf_32_le)
test_file=open(PATH_HERE/test_file.dat','ab')
test_file.write(bts)
test_file.close()

To clarify, what I need is the integer value 42, written as a 4 byte binary. Next, I would write the numbers 1 and 0 in 4 byte binary. At that point, I should have exactly 12 bytes. Each is a 4 byte signed integer, written in binary. I'm pretty new to Python, and can't seem to get it to work out. Any suggestions? Soemthing like this? I need complete control over how many bytes each integer (and later, 4 byte floating point ) is.

Thanks

like image 301
Schafer Avatar asked Aug 06 '14 19:08

Schafer


1 Answers

You need the struct module.

import struct

fout = open('test.dat', 'wb')

fout.write(struct.pack('>i', 42))
fout.write(struct.pack('>f', 2.71828182846))

fout.close()

The first argument in struct.pack is the format string.

The first character in the format string dictates the byte order or endianness of the data (Is the most significant or least significant byte stored first - big-endian or little-endian). Endianness varies from system to system. If ">" doesn't work try "<".

The second character in the format string is the data type. Unsurprisingly the "i" stands for integer and the "f" stands for float. The number of bytes is determined by the type. Shorts or "h's" for example are two bytes long. There are also codes for unsigned types. "H" corresponds to an unsigned short for instance.

The second argument in struct.pack is of course the value to be packed into the bytes object.

Here's the part where I tell you that I lied about a couple of things. First I said that the number of bytes is determined by the type. This is only partially true. The size of a given type is technically platform dependent as the C/C++ standard (which the struct module is based on) merely specifies minimum sizes. This leads me to the second lie. The first character in the format string also encodes whether the standard (minimum) number of bytes or the native (platform dependent) number of bytes is to be used. (Both ">" and "<" guarantee that the standard, minimum number of bytes is used which is in fact four in the case of an integer "i" or float "f".) It additionally encodes the alignment of the data.

The documentation on the struct module has tables for the format string parameters.

You can also pack multiple primitives into a single bytes object and realize the same result.

import struct

fout = open('test.dat', 'wb')

fout.write(struct.pack('>if', 42, 2.71828182846))

fout.close()

And you can of course parse binary data with struct.unpack.

like image 66
M.J. Rayburn Avatar answered Oct 13 '22 10:10

M.J. Rayburn