Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python struct.unpack(ing) when there are multiple byte-orders?

I have a function that reads a binary file and then unpacks the file's contents using struct.unpack(). My function works just fine. It is faster if/when I unpack the whole of the file using a long 'format' string. Problem is that sometimes the byte-alignment changes so my format string (which is invalid) would look like '<10sHHb>llh' (this is just an example (they are usually way longer)). Is there any ultra slick/pythonic way of handling this situation?

like image 894
Jesse Rubin Avatar asked Oct 16 '22 06:10

Jesse Rubin


1 Answers

Nothing super-slick, but if speed counts, the struct module top-level functions are wrappers that have to repeatedly recheck a cache for the actual struct.Struct instance corresponding to the format string; while you must make separate format strings, you might solve part of your speed problem by avoiding that repeated cache check.

Instead of doing:

buffer = memoryview(somedata)
allresults = []
while buffer:
    allresults += struct.unpack_from('<10sHHb', buffer)
    buffer = buffer[struct.calcsize('<10sHHb'):]
    allresults += struct.unpack_from('>llh', buffer)
    buffer = buffer[struct.calcsize('>llh'):]

You'd do:

buffer = memoryview(somedata)
structa = struct.Struct('<10sHHb')
structb = struct.Struct('>llh')
allresults = []
while buffer:
    allresults += structa.unpack_from(buffer)
    buffer = buffer[structa.size:]
    allresults += structb.unpack_from(buffer)
    buffer = buffer[structb.size:]

No, it's not much nicer looking, and the speed gains aren't likely to blow you away. But you've got weird data, so this is the least brittle solution.

If you want unnecessarily clever/brittle solutions, you could do this with ctypes custom Structures, nesting BigEndianStructure(s) inside a LittleEndianStructure or vice-versa. For your example format :

from ctypes import *

class BEStruct(BigEndianStructure):
    _fields_ = [('x', 2 * c_long), ('y', c_short)]
    _pack_ = True

class MainStruct(LittleEndianStructure):
    _fields_ = [('a', 10 * c_char), ('b', 2 * c_ushort), ('c', c_byte), ('big', BEStruct)]
    _pack_ = True

would give you a structure such that you could do:

mystruct = MainStruct()
memoryview(mystruct).cast('B')[:] = bytes(range(25))

and you'd then get results in the expected order, e.g.:

>>> hex(mystruct.b[0])  # Little endian as expected in main struct
'0xb0a'
>>> hex(mystruct.big.x[0]) # Big endian from inner big endian structure
'0xf101112'

While clever in a way, it's likely it will run slower (ctypes attribute lookup is weirdly slow in my experience), and unlike struct module functions, you can't just unpack into top-level named variables in a single line, it's attribute access all the way.

like image 136
ShadowRanger Avatar answered Nov 03 '22 07:11

ShadowRanger