Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python struct calsize different from actual

Tags:

I am trying to read one short and long from a binary file using python struct.

But the

print(struct.calcsize("hl")) # o/p 16

which is wrong, It should have been 2 bytes for short and 8 bytes for long. I am not sure i am using the struct module the wrong way.

When i print the value for each it is

print(struct.calcsize("h")) # o/p 2
print(struct.calcsize("l")) # o/p 8

Is there a way to force python to maintain the precision on datatypes?

like image 989
Ivin Polo Sony Avatar asked Jan 24 '18 14:01

Ivin Polo Sony


People also ask

What is struct Calcsize in Python?

Python struct calcsize() This function calculates and returns the size of the String representation of struct with a given format. Size is calculated in terms of bytes.

What is the Python equivalent of struct?

Python does not exactly have the same thing as a struct in Matlab. You can achieve something like it by defining an empty class and then defining attributes of the class. You can check if an object has a particular attribute using hasattr.

How does Python struct work?

The struct module in Python is used to convert native Python data types such as strings and numbers into a string of bytes and vice versa. What this means is that users can parse binary files of data stored in C structs in Python.

What does struct pack do?

The struct. pack() converts a list of values into corresponding string types. The user should specify both the format and order of the values that should be converted.


2 Answers

By default struct alignment rules, 16 is the correct answer. Each field is aligned to match its size, so you end up with a short for two bytes, then six bytes of padding (to reach the next address aligned to a multiple of eight bytes), then eight bytes for the long.

You can use a byte order prefix (any of them disable padding), but they also disable machine native sizes (so struct.calcsize("=l") will be a fixed 4 bytes on all systems, and struct.calcsize("=hl") will be 6 bytes on all systems, not 10, even on systems with 8 byte longs).

If you want to compute struct sizes for arbitrary structures using machine native types with non-default padding rules, you'll need to go to the ctypes module, define your ctypes.Structure subclass with the desired _pack_ setting, then use ctypes.sizeof to check the size, e.g.:

from ctypes import Structure, c_long, c_short, sizeof

class HL(Structure):
    _pack_ = 1  # Disables padding for field alignment
    # Defines (unnamed) fields, a short followed by long
    _fields_ = [("", c_short),
               ("", c_long)]

print(sizeof(HL))

which outputs 10 as desired.

This could be factored out as a utility function if needed (this is a simplified example that doesn't handle all struct format codes, but you can expand if needed):

from ctypes import *

FMT_TO_TYPE = dict(zip("cb?hHiIlLqQnNfd",
                       (c_char, c_byte, c_bool, c_short, c_ushort, c_int, c_uint,
                        c_long, c_ulong, c_longlong, c_ulonglong, 
                        c_ssize_t, c_size_t, c_float, c_double)))

def calcsize(fmt, pack=None):
    '''Compute size of a format string with arbitrary padding (defaults to native)'''
    class _(Structure):
        if pack is not None:
            _pack_ = pack
        _fields_ = [("", FMT_TO_TYPE[c]) for c in fmt]
    return sizeof(_)

which, once defined, lets you compute sizes padded or unpadded like so:

>>> calcsize("hl")     # Defaults to native "natural" alignment padding
16
>>> calcsize("hl", 1)  # pack=1 means no alignment padding between members
10
like image 105
ShadowRanger Avatar answered Sep 21 '22 13:09

ShadowRanger


This is what the doc says:

By default, the result of packing a given C struct includes pad bytes in order to maintain proper alignment for the C types involved; similarly, alignment is taken into account when unpacking. This behavior is chosen so that the bytes of a packed struct correspond exactly to the layout in memory of the corresponding C struct. To handle platform-independent data formats or omit implicit pad bytes, use standard size and alignment instead of native size and alignment

Changing it from standard to native is pretty easy: you just append the prefix = before the format characters.

print(struct.calcsize("=hl"))

EDIT

Since from the native to standard mode, some default sizes are changed, you have two options:

  • keeping the native mode, but switching the format characters, in this way: struct.calcsize("lh"). In C even the order of your variable inside the struct is important. Here the padding is 8 bytes, it means that every variable has to be referenced at multiple of 8 bytes.

  • Using the format characters of the standard mode, so: struct.calcsize("=hq")

like image 24
Marco Luzzara Avatar answered Sep 23 '22 13:09

Marco Luzzara