Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting elements from ctype structure with introspection?

Tags:

python

ctypes

I failed to find anything to help me with this kind of problem: I'm trying to get an offset of an attribute which is part of nested structures such as:

data_types.py

class FirstStructure (ctypes.Structure):
    _fields_ = [('Junk', ctypes.c_bool),
                ('ThisOneIWantToGet', ctypes.c_int8)
                ]


class SecondStructure (ctypes.Structure):
    _fields_ = [('Junk', ctypes.c_double),
                ('Example', FirstStructure)
                ]

Important thing to mention is that i know only name of parent structure SecondStructure and I have absolutely no idea how many nesting structures can be there.

What I want to do here is to get an offset of ThisOneIWantToGet attribute from beginning of SecondStructure.

I know there is ctypes.adressof method which works on ctypes objects. Is there any simple way of getting an object of nested parameters so I could do something like this:

do_something.py

import data_types as dt
par_struct_obj = getattr(dt, 'SecondStructure')
par_obj = getattr(par_struct_obj , 'ThisOneIWantToGet')
print ctypes.addressof(parameter) - ctypes.addressof(parent_structure)
like image 222
Sir DrinksCoffeeALot Avatar asked May 12 '18 09:05

Sir DrinksCoffeeALot


1 Answers

I'm going to start by pointing out ctypes official documentation: [Python 3.5]: ctypes - A foreign function library for Python.

I defined a bit more complex structure tree (2 nesting levels).

data_types.py:

import ctypes


PRAGMA_PACK = 0


class Struct2(ctypes.Structure):
    if PRAGMA_PACK:
        _pack_ = PRAGMA_PACK
    _fields_ = [
        ("c_0", ctypes.c_char),  # 1B
        ("s_0", ctypes.c_short),  # 2B
        ("wanted", ctypes.c_int), # 4B
    ]


class Struct1(ctypes.Structure):
    if PRAGMA_PACK:
        _pack_ = PRAGMA_PACK
    _fields_ = [
        ("d_0", ctypes.c_double),  # 8B
        ("c_0", ctypes.c_char),  # 1B
        ("struct2_0", Struct2),
    ]


class Struct0(ctypes.Structure):
    if PRAGMA_PACK:
        _pack_ = PRAGMA_PACK
    _fields_ = [
        ("i_0", ctypes.c_int),  # 4B
        ("s_0", ctypes.c_short),  # 2B
        ("struct1_0", Struct1),
    ]

Notes:

  • I named the member of interest wanted (part of Struct2 which is the deepest one)
  • One important thing when dealing with structs, it's alignment. Check [MSDN]: #pragma pack for more details.

In order to illustrate the 2nd bullet (above), I prepared a small example (which has nothing to do with the question).

test_addressof.py:

import sys
import ctypes
import data_types


OFFSET_TEXT = "Offset of '{:s}' member in '{:s}' instance: {:3d} (0x{:08X})"


def offset_addressof(child_structure_instance, parent_structure_instance):
    return ctypes.addressof(child_structure_instance) - ctypes.addressof(parent_structure_instance)


def print_offset_addressof_data(child_structure_instance, parent_structure_instance):
    offset = offset_addressof(child_structure_instance, parent_structure_instance)
    print(OFFSET_TEXT.format(child_structure_instance.__class__.__name__, parent_structure_instance.__class__.__name__, offset, offset))


def main():
    s0 = data_types.Struct0()
    s1 = s0.struct1_0
    s2 = s1.struct2_0
    print("PRAGMA_PACK: {:d} {:s}\n".format(data_types.PRAGMA_PACK, "" if data_types.PRAGMA_PACK else "(default)"))
    print_offset_addressof_data(s1, s0)
    print_offset_addressof_data(s2, s1)
    print_offset_addressof_data(s2, s0)
    print("\nAlignments and sizes:\n\t'{:s}': {:3d} - {:3d}\n\t'{:s}': {:3d} - {:3d}\n\t'{:s}': {:3d} - {:3d}".format(
            s0.__class__.__name__, ctypes.alignment(s0), ctypes.sizeof(s0),
            s1.__class__.__name__, ctypes.alignment(s1), ctypes.sizeof(s1),
            s2.__class__.__name__, ctypes.alignment(s2), ctypes.sizeof(s2)
        )
    )
    #print("Struct0().i_0 type: {:s}".format(s0.i_0.__class__.__name__))


if __name__ == "__main__":
    print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
    main()

Notes:

  • Native C members types are converted to Python types, in a ctypes.Structure, and ctypes.addressof would raise TypeError if receiving such an argument (check the commented print from main)
  • I tried to use C types that have the same size across various OSes (e.g. I avoided ctypes.c_long which is 8 bytes long on Lnx and 4 bytes long on Win (talking about 64 bit versions, of course))
  • Source modification is required between the 2 example runs. I could have generated the classes dynamically, but that would have added unnecessary complexity to the code (and moved away from the point I'm trying to make)

Output:

(py35x64_test) e:\Work\Dev\StackOverflow\q050304516>python test_addressof.py
Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

PRAGMA_PACK: 0 (default)

Offset of 'Struct1' member in 'Struct0' instance:   8 (0x00000008)
Offset of 'Struct2' member in 'Struct1' instance:  12 (0x0000000C)
Offset of 'Struct2' member in 'Struct0' instance:  20 (0x00000014)

Alignments and sizes:
        'Struct0':   8 -  32
        'Struct1':   8 -  24
        'Struct2':   4 -   8

(py35x64_test) e:\Work\Dev\StackOverflow\q050304516>rem change PRAGMA_PACK = 1 in data_types.py

(py35x64_test) e:\Work\Dev\StackOverflow\q050304516>python test_addressof.py
Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

PRAGMA_PACK: 1

Offset of 'Struct1' member in 'Struct0' instance:   6 (0x00000006)
Offset of 'Struct2' member in 'Struct1' instance:   9 (0x00000009)
Offset of 'Struct2' member in 'Struct0' instance:  15 (0x0000000F)

Alignments and sizes:
        'Struct0':   1 -  22
        'Struct1':   1 -  16
        'Struct2':   1 -   7


struct_util.py:

import sys
import ctypes

import data_types


WANTED_MEMBER_NAME = "wanted"
FIELDS_MEMBER_NAME = "_fields_"


def _get_padded_size(sizes, align_size):
    padded_size = temp = 0
    for size in sizes:
        if temp >= align_size:
            padded_size += temp
            temp = size
        elif temp + size > align_size:
            padded_size += align_size
            temp = size
        else:
            temp += size
    if temp:
        padded_size += max(size, align_size)
    return padded_size


def _get_array_type_sizes(array_type):
    if issubclass(array_type._type_, ctypes.Array):
        return _get_array_type_sizes(array_type._type_) * array_type._type_._length_
    else:
        return [array_type._type_] * array_type._length_


def get_nested_offset_recursive(struct_instance, wanted_member_name):
    if not isinstance(struct_instance, ctypes.Structure):
        return -1
    align_size = ctypes.alignment(struct_instance)
    base_address = ctypes.addressof(struct_instance)
    member_sizes = list()
    for member_name, member_type in getattr(struct_instance, FIELDS_MEMBER_NAME, list()):
        if member_name == wanted_member_name:
            return _get_padded_size(member_sizes, align_size)
        if issubclass(member_type, ctypes.Structure):
            nested_struct_instance = getattr(struct_instance, member_name)
            inner_offset = get_nested_offset_recursive(nested_struct_instance, wanted_member_name)
            if inner_offset != -1:
                return ctypes.addressof(nested_struct_instance) - base_address + inner_offset
            else:
                member_sizes.append(ctypes.sizeof(member_type))
        else:
            if issubclass(member_type, ctypes.Array):
                member_sizes.extend(_get_array_type_sizes(member_type))
            else:
                member_sizes.append(ctypes.sizeof(member_type))
    return -1


def _get_struct_instance_from_name(struct_name):
    struct_class = getattr(data_types, struct_name, None)
    if struct_class:
        return struct_class()


def get_nested_offset(struct_name, wanted_member_name):
    struct_instance = _get_struct_instance_from_name(struct_name)
    return get_nested_offset_recursive(struct_instance, wanted_member_name)


def main():
    struct_names = [
        "Struct2",
        "Struct1",
        "Struct0"
    ]
    wanted_member_name = WANTED_MEMBER_NAME
    print("PRAGMA_PACK: {:d} {:s}\n".format(data_types.PRAGMA_PACK, "" if data_types.PRAGMA_PACK else "(default)"))
    for struct_name in struct_names:
        print("'{:s}' offset in '{:s}' (size: {:3d}): {:3d}".format(wanted_member_name,
                                                                    struct_name,
                                                                    ctypes.sizeof(_get_struct_instance_from_name(struct_name)),
                                                                    get_nested_offset(struct_name, wanted_member_name)))


if __name__ == "__main__":
    print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
    main()

Notes:

  • The code is (waaaay) more complex than I initially anticipated. I think there is a simpler way, but I just can't see it. Hopefully I'm not missing smth so obvious, that the whole thing could be done in 2 - 3 lines of code
  • It's supposed to work with any structures, although there are (many) cases that I didn't test (especially arrays of structs, where there are cases that won't work)
  • It will stop at the 1st member occurrence found
  • Functions (1 by 1):
    • get_nested_offset_recursive - core function: recursively searches for the member in the structures and calculates its offset. There are 2 cases:
      • Member is in a child structure (or child's child, ...): offset to child structure is calculated by subtracting the 2 structures addresses (using ctypes.addressof)
      • Member is in current structure (complex case): offset is calculated considering the sizes of the members before it, and structure alignment
    • _get_padded_size - tries to fit member sizes (before the one that we care about) in align_size large chunks, and returns the chunks sizes sum
    • _get_array_type_sizes - arrays are not atomic (from alignment PoV): a char c[10]; member could be replaced by char c0, c1, ..., c9;. This is what this function does (recursively)
    • _get_struct_instance_from_\name - helper or convenience function: returns an instance of the structure name (searched in data_types module) given as argument
    • get_nested_offset - wrapper function

Output (same principle as above):

(py35x64_test) e:\Work\Dev\StackOverflow\q050304516>python struct_util.py
Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

PRAGMA_PACK: 0 (default)

'wanted' offset in 'Struct2' (size:   8):   4
'wanted' offset in 'Struct1' (size:  24):  16
'wanted' offset in 'Struct0' (size:  32):  24

(py35x64_test) e:\Work\Dev\StackOverflow\q050304516>rem change PRAGMA_PACK = 1 in data_types.py

(py35x64_test) e:\Work\Dev\StackOverflow\q050304516>python struct_util.py
Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

PRAGMA_PACK: 1

'wanted' offset in 'Struct2' (size:   7):   3
'wanted' offset in 'Struct1' (size:  16):  12
'wanted' offset in 'Struct0' (size:  22):  18

@EDIT0:

As I specified in the 1st and (especially) the 2nd notes, I wasn't happy with the solution, mainly because even if it works the current scenario, it doesn't for the general one (nesting arrays and structures). Then I came across [SO]: Ctypes: Get a pointer to a struct field (@MarkTolonen's answer), and took a different approach.

data_types.py (add the following code to the previous content):

class Struct0_1(ctypes.Structure):
    if PRAGMA_PACK:
        _pack_ = PRAGMA_PACK
    _fields_ = [
        ("i_0", ctypes.c_int),  # 4B
        ("s_0", ctypes.c_short),  # 2B
        ("struct1_0_2", Struct1 * 2),
        ("i_1", ctypes.c_int * 2),  # 2 * 4B
        ("struct1_1", Struct1),
        ("i_2", ctypes.c_int),  # 4B
        ("struct1_2_3", Struct1 * 3),
    ]

struct_util_v2.py:

import sys
import ctypes

import data_types


WANTED_MEMBER_NAME = "wanted"

def _get_nested_offset_recursive_struct(struct_ctype, member_name):
    for struct_member_name, struct_member_ctype in struct_ctype._fields_:
        struct_member = getattr(struct_ctype, struct_member_name)
        offset = struct_member.offset
        if struct_member_name == member_name:
            return offset
        else:
            if issubclass(struct_member_ctype, ctypes.Structure):
                inner_offset = _get_nested_offset_recursive_struct(struct_member_ctype, member_name)
            elif issubclass(struct_member_ctype, ctypes.Array):
                inner_offset = _get_nested_offset_recursive_array(struct_member_ctype, member_name)
            else:
                inner_offset = -1
            if inner_offset != -1:
                return inner_offset + offset
    return -1


def _get_nested_offset_recursive_array(array_ctype, member_name):
    array_base_ctype = array_ctype._type_
    for idx in range(array_ctype._length_):
        if issubclass(array_base_ctype, ctypes.Structure):
            inner_offset = _get_nested_offset_recursive_struct(array_base_ctype, member_name)
        elif issubclass(array_base_ctype, ctypes.Array):
            inner_offset = _get_nested_offset_recursive_array(array_base_ctype, member_name)
        else:
            inner_offset = -1
        return inner_offset


def get_nested_offset_recursive(ctype, member_name, nth=1):
    if issubclass(ctype, ctypes.Structure):
        return _get_nested_offset_recursive_struct(ctype, member_name)
    elif issubclass(ctype, ctypes.Array):
        return _get_nested_offset_recursive_array(ctype, member_name)
    else:
        return -1


def main():
    struct_names = [
        "Struct2",
        "Struct1",
        "Struct0",
        "Struct0_1",
    ]
    member_name = WANTED_MEMBER_NAME
    print("PRAGMA_PACK: {:d} {:s}\n".format(data_types.PRAGMA_PACK, "" if data_types.PRAGMA_PACK else "(default)"))
    for struct_name in struct_names:
        struct_ctype = getattr(data_types, struct_name)
        print("'{:s}' offset in '{:s}' (size: {:3d}): {:3d}".format(member_name,
                                                                    struct_name,
                                                                    ctypes.sizeof(struct_ctype),
                                                                    get_nested_offset_recursive(struct_ctype, member_name)))


if __name__ == "__main__":
    print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
    main()

Notes:

  • No more working with instances as the offset metadata is stored in the class itself (addressof no longer required)
  • For the newly added structure, previous code wasn't working
  • The true power of the new code would be when handling get_nested_offset_recursive's nth argument (which right now does nothing - can be removed) that tells which occurrence's offset of the member name should be reported (it only makes sense for arrays of structures), but that's a bit more complicated and thus requires more code
  • Subject to debate could be structure members being pointers to structures (some might argue to treat them as arrays), but I think that since such the (inner) structures reside in another memory area, just skip them (the fact the code is simpler using this approach had nothing to do with the decision)

Output:

(py35x64_test) e:\Work\Dev\StackOverflow\q050304516>python struct_util_v2.py
Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

PRAGMA_PACK: 0 (default)

'wanted' offset in 'Struct2' (size:   8):   4
'wanted' offset in 'Struct1' (size:  24):  16
'wanted' offset in 'Struct0' (size:  32):  24
'wanted' offset in 'Struct0_1' (size: 168):  24

(py35x64_test) e:\Work\Dev\StackOverflow\q050304516>rem change PRAGMA_PACK = 1 in data_types.py

(py35x64_test) e:\Work\Dev\StackOverflow\q050304516>python struct_util_v2.py
Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

PRAGMA_PACK: 1

'wanted' offset in 'Struct2' (size:   7):   3
'wanted' offset in 'Struct1' (size:  16):  12
'wanted' offset in 'Struct0' (size:  22):  18
'wanted' offset in 'Struct0_1' (size: 114):  18

@EDIT1:

Added support for nth argument (renamed it: index).

struct_util_v3.py:

import sys
import ctypes

import data_types


WANTED_MEMBER_NAME = "wanted"
OFFSET_INVALID = -1

def _get_nested_offset_recursive_struct(struct_ctype, member_name, index):
    current_index = 0
    for struct_member_name, struct_member_ctype in struct_ctype._fields_:
        struct_member = getattr(struct_ctype, struct_member_name)
        offset = struct_member.offset
        if struct_member_name == member_name:
            if index == 0:
                return offset, 0
            else:
                current_index += 1
        else:
            if issubclass(struct_member_ctype, ctypes.Structure):
                inner_offset, occurences = _get_nested_offset_recursive_struct(struct_member_ctype, member_name, index - current_index)
            elif issubclass(struct_member_ctype, ctypes.Array):
                inner_offset, occurences = _get_nested_offset_recursive_array(struct_member_ctype, member_name, index - current_index)
            else:
                inner_offset, occurences = OFFSET_INVALID, 0
            if inner_offset != OFFSET_INVALID:
                return inner_offset + offset, 0
            else:
                current_index += occurences
    return OFFSET_INVALID, current_index


def _get_nested_offset_recursive_array(array_ctype, member_name, index):
    array_base_ctype = array_ctype._type_
    array_base_ctype_size = ctypes.sizeof(array_base_ctype)
    current_index = 0
    for idx in range(array_ctype._length_):
        if issubclass(array_base_ctype, ctypes.Structure):
            inner_offset, occurences = _get_nested_offset_recursive_struct(array_base_ctype, member_name, index - current_index)
        elif issubclass(array_base_ctype, ctypes.Array):
            inner_offset, occurences = _get_nested_offset_recursive_array(array_base_ctype, member_name, index - current_index)
        else:
            inner_offset, occurences = OFFSET_INVALID, 0
        if inner_offset != OFFSET_INVALID:
            return array_base_ctype_size * idx + inner_offset, 0
        else:
            if occurences == 0:
                return OFFSET_INVALID, 0
            else:
                current_index += occurences
    return OFFSET_INVALID, current_index


def get_nested_offset_recursive(ctype, member_name, index=0):
    if index < 0:
        return OFFSET_INVALID
    if issubclass(ctype, ctypes.Structure):
        return _get_nested_offset_recursive_struct(ctype, member_name, index)[0]
    elif issubclass(ctype, ctypes.Array):
        return _get_nested_offset_recursive_array(ctype, member_name, index)[0]
    else:
        return OFFSET_INVALID


def main():
    struct_names = [
        "Struct2",
        "Struct1",
        "Struct0",
        "Struct0_1",
    ]
    member_name = WANTED_MEMBER_NAME
    print("PRAGMA_PACK: {:d} {:s}\n".format(data_types.PRAGMA_PACK, "" if data_types.PRAGMA_PACK else "(default)"))
    for struct_name in struct_names:
        struct_ctype = getattr(data_types, struct_name)
        nth = 1
        ofs = get_nested_offset_recursive(struct_ctype, member_name, index=nth - 1)
        while ofs != OFFSET_INVALID:
            print("'{:s}' offset (#{:03d}) in '{:s}' (size: {:3d}): {:3d}".format(member_name,
                                                                                 nth,
                                                                                 struct_name,
                                                                                 ctypes.sizeof(struct_ctype),
                                                                                 ofs))
            nth += 1
            ofs = get_nested_offset_recursive(struct_ctype, member_name, index=nth - 1)


if __name__ == "__main__":
    print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
    main()

Notes:

  • get_nested_offset_recursive's index argument is the (0 based) index in the member occurrences list - or how many occurrences to skip before reporting the offset (default: 0 - meaning that it will report 1st occurrence's offset)
  • Didn't test thoroughly, but I think I covered all cases
  • For each structure, the program lists the offsets of all member occurrences (until it doesn't find it)
  • Now, the code is in the shape that I thought of at the beginning

Output:

(py35x64_test) e:\Work\Dev\StackOverflow\q050304516>python struct_util_v3.py
Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

PRAGMA_PACK: 0 (default)

'wanted' offset (#001) in 'Struct2' (size:   8):   4
'wanted' offset (#001) in 'Struct1' (size:  24):  16
'wanted' offset (#001) in 'Struct0' (size:  32):  24
'wanted' offset (#001) in 'Struct0_1' (size: 192):  24
'wanted' offset (#002) in 'Struct0_1' (size: 192):  48
'wanted' offset (#003) in 'Struct0_1' (size: 192):  72
'wanted' offset (#004) in 'Struct0_1' (size: 192): 104
'wanted' offset (#005) in 'Struct0_1' (size: 192): 136
'wanted' offset (#006) in 'Struct0_1' (size: 192): 160

(py35x64_test) e:\Work\Dev\StackOverflow\q050304516>rem change PRAGMA_PACK = 1 in data_types.py

(py35x64_test) e:\Work\Dev\StackOverflow\q050304516>python struct_util_v3.py
Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

PRAGMA_PACK: 1

'wanted' offset (#001) in 'Struct2' (size:   7):   3
'wanted' offset (#001) in 'Struct1' (size:  16):  12
'wanted' offset (#001) in 'Struct0' (size:  22):  18
'wanted' offset (#001) in 'Struct0_1' (size: 130):  18
'wanted' offset (#002) in 'Struct0_1' (size: 130):  34
'wanted' offset (#003) in 'Struct0_1' (size: 130):  50
'wanted' offset (#004) in 'Struct0_1' (size: 130):  74
'wanted' offset (#005) in 'Struct0_1' (size: 130):  94
'wanted' offset (#006) in 'Struct0_1' (size: 130): 110
like image 131
CristiFati Avatar answered Oct 22 '22 07:10

CristiFati