I have some byte data that wants to be parsed as a stream, since bytes earlier in the sequence control the interpretation of downstream bytes. So BytesIO looks like the thing I want. But I also want to use the facilities provided by the struct module. But struct's interfaces aren't streaming. Is there a clever/idiomatic way to marry the two?
By way of example, here's an example chunk of data:
b'\n\x00\x02\x90\x10\x00\n\x00\x02`\x10\x00\n\x00\x02\x80\x10\x00'
I want to pull the first 4 bytes as an unsigned big endian int (e.g. struct.unpack(fmt='>I'
). Because the next byte is 0x10, I know there should be one more byte, which turns out to be 0x00. And then it starts over again, read the next 4 (0x0A000290), wash, rinse, repeat. The byte(s) immediately following each 4 byte id, trigger a variety of downstream reads (some bytes, some shorts).
I could do things like
stream = b'\n\x00\x02\x90\x10\x00\n\x00\x02`\x10\x00\n\x00\x02\x80\x10\x00'
while stream:
id = struct.unpack('>I', stream[:4])
stream = stream[4:]
...
But that seems less than elegant.
You can unpack them using the same structure string (the '<' + 'I' * elements part) - e.g. struct. unpack('<' + 'I' * elements, value) .
The return type of struct. unpack() is always a tuple. The function is given a format string and the binary form of data. This function is used to parse the binary form of data stored as a C structure.
struct. calcsize('P') calculates the number of bytes required to store a single pointer -- returning 4 on a 32-bit system and 8 on a 64-bit system.
What I generally do is:
def unpack(stream, fmt):
size = struct.calcsize(fmt)
buf = stream.read(size)
return struct.unpack(fmt, buf)
For example:
>>> b = io.BytesIO(b'\n\x00\x02\x90\x10\x00\n\x00\x02`\x10\x00\n\x00\x02\x80\x10\x00')
>>> print(unpack(b, '>I'))
(167772816,)
>>> print(unpack(b, '>I'))
(268438016,)
>>> print(unpack(b, '>I'))
(39849984,)
>>> print(unpack(b, '>I'))
(167772800,)
>>> print(unpack(b, '>H'))
(4096,)
If you want to know if you're consumed the whole stream, you can always just do this:
buf = stream.read(1)
if buf:
raise ValueError("Stream not consumed")
But it's probably simpler to just call the same function you're already using:
>>> def ensure_finished(stream):
... try:
... unpack(stream, 'c')
... except struct.error:
... pass
... else:
... raise ValueError('Stream not consumed')
>>> ensure_finished(b)
If you're using a stream that may read
less than the requested number of bytes, you'll want to use a while
loop to keep reading and appending until EOF or you get enough bytes. Otherwise, this is all you need.
Use struct
s buffer API:
buf = b'\n\x00\x02…'
offset = 0
id = struct.unpack_from('>I', buf, offset); offset += 4
⋮
x = struct.unpack_from('…', buf, offset)
If you want to avoid stating the offset after each operation, you can write a little wrapper, like so:
class unpacker(object):
def __init__(self, buf):
self._buf = buf
self._offset = 0
def __call__(self, fmt):
result = struct.unpack_from(fmt, self._buf, self._offset)
self._offset += struct.calcsize(fmt)
return result
⋮
unpack = unpacker(buf)
id = unpack('>I')
⋮
x = unpack('…')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With