I have a function that reads a binary file and then unpacks the file's contents using struct.unpack(). My function works just fine. It is faster if/when I unpack the whole of the file using a long 'format' string. Problem is that sometimes the byte-alignment changes so my format string (which is invalid) would look like '<10sHHb>llh' (this is just an example (they are usually way longer)). Is there any ultra slick/pythonic way of handling this situation?
Nothing super-slick, but if speed counts, the struct
module top-level functions are wrappers that have to repeatedly recheck a cache for the actual struct.Struct
instance corresponding to the format string; while you must make separate format strings, you might solve part of your speed problem by avoiding that repeated cache check.
Instead of doing:
buffer = memoryview(somedata)
allresults = []
while buffer:
allresults += struct.unpack_from('<10sHHb', buffer)
buffer = buffer[struct.calcsize('<10sHHb'):]
allresults += struct.unpack_from('>llh', buffer)
buffer = buffer[struct.calcsize('>llh'):]
You'd do:
buffer = memoryview(somedata)
structa = struct.Struct('<10sHHb')
structb = struct.Struct('>llh')
allresults = []
while buffer:
allresults += structa.unpack_from(buffer)
buffer = buffer[structa.size:]
allresults += structb.unpack_from(buffer)
buffer = buffer[structb.size:]
No, it's not much nicer looking, and the speed gains aren't likely to blow you away. But you've got weird data, so this is the least brittle solution.
If you want unnecessarily clever/brittle solutions, you could do this with ctypes
custom Structure
s, nesting BigEndianStructure
(s) inside a LittleEndianStructure
or vice-versa. For your example format :
from ctypes import *
class BEStruct(BigEndianStructure):
_fields_ = [('x', 2 * c_long), ('y', c_short)]
_pack_ = True
class MainStruct(LittleEndianStructure):
_fields_ = [('a', 10 * c_char), ('b', 2 * c_ushort), ('c', c_byte), ('big', BEStruct)]
_pack_ = True
would give you a structure such that you could do:
mystruct = MainStruct()
memoryview(mystruct).cast('B')[:] = bytes(range(25))
and you'd then get results in the expected order, e.g.:
>>> hex(mystruct.b[0]) # Little endian as expected in main struct
'0xb0a'
>>> hex(mystruct.big.x[0]) # Big endian from inner big endian structure
'0xf101112'
While clever in a way, it's likely it will run slower (ctypes
attribute lookup is weirdly slow in my experience), and unlike struct
module functions, you can't just unpack into top-level named variables in a single line, it's attribute access all the way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With