Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between id(obj) and ctypes.addressof(obj) in CPython

Say I define the following variable using ctypes module

i = c_int(4)

and afterwards I try to find out the memory address of i using:

id(i)

or

ctypes.addressof(i)

which, at the moment, yield different values. Why is that?

like image 848
user228137 Avatar asked May 12 '14 02:05

user228137


People also ask

What does Ctypes do in Python?

ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python.

What is C_char_p?

c_char_p is a subclass of _SimpleCData , with _type_ == 'z' . The __init__ method calls the type's setfunc , which for simple type 'z' is z_set . In Python 2, the z_set function (2.7. 7) is written to handle both str and unicode strings.


1 Answers

What you are suggesting should be the case is an implementation detail of CPython.

The id() function:

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime.

CPython implementation detail: This is the address of the object in memory.

While they might be equivalent in CPython, this is not guaranteed to be true in other implementations of Python.


Why are they different values, even in CPython?

Note that a c_int:

  • is a Python Object. CPython's id() will return the address of this.

  • contains a 4-byte C-compatible int value. ctypes.addressof() will return the address of this.

The metadata in a Python object takes up space. Because of this, that 4-byte value probably won't live at the very beginning of the Python object.

Look at this example:

>>> import ctypes
>>> i = ctypes.c_int(4)
>>> hex(id(i))
'0x22940d0'
>>> hex(ctypes.addressof(i))
'0x22940f8'

We see that the addressof result is only 0x28 bytes higher than the result of id(). Playing around with this a few times, we can see that this is always the case. Therefore, I'd say that there are 0x28 bytes of Python object metadata preceding the actual int value in the overall c_int.

In my above example:

   c_int
 ___________
|           |   0x22940d0   This is what id() returns
| metadata  |
|           |
|           |
|           |
|           |
|___________|
|   value   |   0x22940f8   This is what addressof() returns
|___________|

Edit:

In the CPython implementation of ctypes, the base CDataObject (2.7.6 source) has a b_ptr member that points to the memory block used for the object's C data:

union value {
                char c[16];
                short s;
                int i;
                long l;
                float f;
                double d;
#ifdef HAVE_LONG_LONG
                PY_LONG_LONG ll;
#endif
                long double D;
};

struct tagCDataObject {
    PyObject_HEAD
    char *b_ptr;                /* pointer to memory block */
    int  b_needsfree;           /* need _we_ free the memory? */
    CDataObject *b_base;        /* pointer to base object or NULL */
    Py_ssize_t b_size;          /* size of memory block in bytes */
    Py_ssize_t b_length;        /* number of references we need */
    Py_ssize_t b_index;         /* index of this object into base's
                                   b_object list */
    PyObject *b_objects;        /* dictionary of references we need 
                                   to keep, or Py_None */
    union value b_value;
};

addressof returns this pointer as a Python integer:

static PyObject *
addressof(PyObject *self, PyObject *obj)
{
    if (CDataObject_Check(obj))
        return PyLong_FromVoidPtr(((CDataObject *)obj)->b_ptr);
    PyErr_SetString(PyExc_TypeError,
                    "invalid type");
    return NULL;
}

Small C objects use the default 16-byte b_value member of the CDataObject. As the example above shows, this default buffer is used for the c_int(4) instance. We can turn ctypes on itself to introspect c_int(4) in a 32-bit process:

>>> i = c_int(4)
>>> ci = CDataObject.from_address(id(i))

>>> ci
ob_base: 
    ob_refcnt: 1
    ob_type: py_object(<class 'ctypes.c_long'>)
b_ptr: 3071814328
b_needsfree: 1
b_base: LP_CDataObject(<NULL>)
b_size: 4
b_length: 0
b_index: 0
b_objects: py_object(<NULL>)
b_value: 
    c: b'\x04'
    s: 4
    i: 4
    l: 4
    f: 5.605193857299268e-45
    d: 2e-323
    ll: 4
    D: 0.0

>>> addressof(i)
3071814328
>>> id(i) + CDataObject.b_value.offset
3071814328

This trick leverages the fact that id in CPython returns the base address of an object.

like image 154
Jonathon Reinhart Avatar answered Nov 15 '22 18:11

Jonathon Reinhart