Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Idiomatic way to struct.unpack from BytesIO?

I have some byte data that wants to be parsed as a stream, since bytes earlier in the sequence control the interpretation of downstream bytes. So BytesIO looks like the thing I want. But I also want to use the facilities provided by the struct module. But struct's interfaces aren't streaming. Is there a clever/idiomatic way to marry the two?

By way of example, here's an example chunk of data:

b'\n\x00\x02\x90\x10\x00\n\x00\x02`\x10\x00\n\x00\x02\x80\x10\x00'

I want to pull the first 4 bytes as an unsigned big endian int (e.g. struct.unpack(fmt='>I'). Because the next byte is 0x10, I know there should be one more byte, which turns out to be 0x00. And then it starts over again, read the next 4 (0x0A000290), wash, rinse, repeat. The byte(s) immediately following each 4 byte id, trigger a variety of downstream reads (some bytes, some shorts).

I could do things like

stream = b'\n\x00\x02\x90\x10\x00\n\x00\x02`\x10\x00\n\x00\x02\x80\x10\x00'
while stream:
    id = struct.unpack('>I', stream[:4])
    stream = stream[4:]
    ...

But that seems less than elegant.

like image 965
Travis Griggs Avatar asked Jul 08 '13 22:07

Travis Griggs


People also ask

How do I unpack a struct?

You can unpack them using the same structure string (the '<' + 'I' * elements part) - e.g. struct. unpack('<' + 'I' * elements, value) .

What does struct unpack return Python?

The return type of struct. unpack() is always a tuple. The function is given a format string and the binary form of data. This function is used to parse the binary form of data stored as a C structure.

What does struct Calcsize do?

struct. calcsize('P') calculates the number of bytes required to store a single pointer -- returning 4 on a 32-bit system and 8 on a 64-bit system.


2 Answers

What I generally do is:

def unpack(stream, fmt):
    size = struct.calcsize(fmt)
    buf = stream.read(size)
    return struct.unpack(fmt, buf)

For example:

>>> b = io.BytesIO(b'\n\x00\x02\x90\x10\x00\n\x00\x02`\x10\x00\n\x00\x02\x80\x10\x00')
>>> print(unpack(b, '>I'))
(167772816,)
>>> print(unpack(b, '>I'))
(268438016,)
>>> print(unpack(b, '>I'))
(39849984,)
>>> print(unpack(b, '>I'))
(167772800,)
>>> print(unpack(b, '>H'))
(4096,)

If you want to know if you're consumed the whole stream, you can always just do this:

buf = stream.read(1)
if buf:
    raise ValueError("Stream not consumed")

But it's probably simpler to just call the same function you're already using:

>>> def ensure_finished(stream):
...     try:
...         unpack(stream, 'c')
...     except struct.error:
...         pass
...     else:
...         raise ValueError('Stream not consumed')
>>> ensure_finished(b)

If you're using a stream that may read less than the requested number of bytes, you'll want to use a while loop to keep reading and appending until EOF or you get enough bytes. Otherwise, this is all you need.

like image 121
abarnert Avatar answered Nov 15 '22 08:11

abarnert


Use structs buffer API:

buf = b'\n\x00\x02…'
offset = 0
id = struct.unpack_from('>I', buf, offset); offset += 4
⋮
x = struct.unpack_from('…', buf, offset)

If you want to avoid stating the offset after each operation, you can write a little wrapper, like so:

class unpacker(object):
    def __init__(self, buf):
        self._buf = buf
        self._offset = 0
    def __call__(self, fmt):
        result = struct.unpack_from(fmt, self._buf, self._offset)
        self._offset += struct.calcsize(fmt)
        return result

⋮

unpack = unpacker(buf)
id = unpack('>I')
⋮
x = unpack('…')
like image 38
Marcelo Cantos Avatar answered Nov 15 '22 06:11

Marcelo Cantos