Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python struct error

Tags:

python

struct

I'm trying to design a system to react to different binary flags.

0 = Error
1 = Okay
2 = Logging
3 = Number

The sequence of this data represents a unique ID to reference the work, the flag and the number. Everything works, except the number flag. This is what I get...

>>> import struct
>>> data = (1234, 3, 12345678)
>>> bin = struct.pack('QHL', *data)
>>> print(bin)
b'\xd2\x04\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00Na\xbc\x00\x00\x00\x00\x00'
>>> result = struct.unpack_from('QH', bin, 0)
>>> print(result)
(1234, 3)
>>> offset = struct.calcsize('QH')
>>> result += struct.unpack_from('L', bin, offset)
>>> print(result)
(1234, 3, 7011541669862440960)

A long should be plenty big to represent the number 12345678, but why is it incorrectly unpacked?

Edit:

When I try to pack them separately, it looks like struct is adding too many null bytes between the flag and the long.

>>> import struct
>>> struct.pack('QH', 1234, 3)
b'\xd2\x04\x00\x00\x00\x00\x00\x00\x03\x00'
>>> struct.pack('L', 12345678)
b'Na\xbc\x00\x00\x00\x00\x00'

I can reproduce this error by adding padding before the long.

>>> struct.unpack('L', struct.pack('L', 12345678))
(12345678,)
>>> struct.unpack('xL', struct.pack('xL', 12345678))
(12345678,)
>>> struct.pack('xL', 12345678)
b'\x00\x00\x00\x00\x00\x00\x00\x00Na\xbc\x00\x00\x00\x00\x00'

Potential fix?

When I use little-endian order, the problem seems to correct itself and make the binary string shorter. Since this is destined for a SSL wrapped TCP socket, that's a win win, right? Keeping bandwidth low is generally good, yes?

>>> import struct
>>> data = (1234, 3, 12345678)
>>> bin = struct.pack('<QHL', *data)
>>> print(bin)
b'\xd2\x04\x00\x00\x00\x00\x00\x00\x03\x00Na\xbc\x00'
>>> result = struct.unpack_from('<QH', bin, 0)
>>> print(result)
(1234, 3)
>>> offset = struct.calcsize('<QH')
>>> result += struct.unpack_from('<L', bin, offset)
>>> print(result)
(1234, 3, 12345678)

Why does this happen? I am perplexed.

like image 953
bkvaluemeal Avatar asked Jan 07 '23 20:01

bkvaluemeal


2 Answers

You are running into byte alignment issues. You need to know that by default the individual parts of a struct are not just placed next to each other but they are properly aligned in memory. This makes it more efficient, especially for other applications, as they have more direct way to access individual bytes from it without having to account for overlap.

You can easily see this by using struct.calcsize to see the required space needed to encode using a format:

>>> struct.calcsize('QHL')
16
>>> struct.calcsize('QH')
10

As you can see QHL requires 16 bytes, but QH requires 10. The L we left off is however only 4 bytes wide. So there is some padding going to on make sure that the L starts again on “a fresh block”. This is because any type requires (with padding) that it starts on a offset that is a multiple of its own size. For QH it looks like this:

QQ QQ | QQ QQ | HH

Once you use QHL, you get the following:

QQ QQ | QQ QQ | HH 00 | LL LL

As you can see, there were two padding bytes added to make sure that L starts on a new block of four.

You can modify the alignment (as well as the endianness) using a special character at the beginning of the format string. In your case, you could use =QHL to disable alignment altogether:

QQ QQ | QQ QQ | HH LL | LL

When I use little-endian order, the problem seems to correct itself and make the binary string shorter. Since this is destined for a SSL wrapped TCP socket, that's a win win, right? Keeping bandwidth low is generally good, yes?

Using an explicit byte order also disables alignment yes, so that’s where the effect comes from. If it’s a good idea to turn of alignment depends though. If you want to use consume your data somewhere else, in other programs, it would be a good idea to stick to native alignment.

like image 59
poke Avatar answered Jan 19 '23 11:01

poke


The correct output in your case:

>>> import struct
>>> data = (1234, 3, 12345678)
>>> bin = struct.pack('QHL', *data)
>>> print(bin)
b'\xd2\x04\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00Na\xbc\x00\x00\x00\x00\x00'
>>> result = struct.unpack_from('QH', bin, 0)
>>> print(result)
(1234, 3)
>>> result += struct.unpack_from('L', bin, 16)
>>> print(result)
(1234, 3, 12345678)

This happens because:

Padding is only automatically added between successive structure members.

Also, the reason your fix works is:

No padding is added when using non-native size and alignment, e.g. with ‘<’, ‘>’, ‘=’, and ‘!’.

like image 38
Capt Planet Avatar answered Jan 19 '23 10:01

Capt Planet