Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to monkey patch python list __setitem__ method

I'd like to monkey-patch Python lists, in particular, replacing the __setitem__ method with custom code. Note that I am not trying to extend, but to overwrite the builtin types. For example:

>>> # Monkey Patch  
... # Replace list.__setitem__ with a Noop
...
>>> myList = [1,2,3,4,5]
>>> myList[0] = "Nope"
>>> myList
[1, 2, 3, 4, 5]

Yes, I know that is a downright perverted thing to do to python code. No, my usecase doesn't really make sense. Nonetheless, can it be done?

Possible avenues:

  • Setting a read only attribute on builtins using ctypes
  • The forbiddenfruit module allows patching of C builtins, but does not work when trying to override the list methods
  • This Gist also manages monkey patching of builtin by manipulating the object's dictionary. I've updated it to Python3 here but it still doesn't allow overriding of the methods.
  • The Pyrthon library overrides the list type in a module to make it immutable by using AST transformation. This could be worth investigating.

Demonstrative example

I actually manage to override the methods themselves, as shown below:

import ctypes

def magic_get_dict(o):
    # find address of dict whose offset is stored in the type
    dict_addr = id(o) + type(o).__dictoffset__
    # retrieve the dict object itself
    dict_ptr = ctypes.cast(dict_addr, ctypes.POINTER(ctypes.py_object))
    return dict_ptr.contents.value

def magic_flush_mro_cache():
    ctypes.PyDLL(None).PyType_Modified(ctypes.cast(id(object), ctypes.py_object))

print(list.__setitem__)
dct = magic_get_dict(list)
dct['__setitem__'] = lambda s, k, v: s
magic_flush_mro_cache()
print(list.__setitem__)

x = [1,2,3,4,5]
print(x.__setitem__)
x.__setitem__(0,10)
x[1] = 20
print(x)

Which outputs the following:

➤ python3 override.py
<slot wrapper '__setitem__' of 'list' objects>
<function <lambda> at 0x10de43f28>
<bound method <lambda> of [1, 2, 3, 4, 5]>
[1, 20, 3, 4, 5]

But as shown in the output, this doesn't seem to affect the normal syntax for setting an item (x[0] = 0)

Alternative: Monkey patching an individual list instance

As a lesser alternative, if I was able to monkey patch an individual list's instance, this could work too. Perhaps by changing the class pointer of the list to a custom class.

like image 454
brice Avatar asked Jul 08 '16 01:07

brice


People also ask

What is monkey patching concept in Python?

In Python, the term monkey patch only refers to dynamic modifications of a class or module at runtime, motivated by the intent to patch existing third-party code as a workaround to a bug or feature which does not act as you desire.

What is the use of monkey patching?

Monkey patching is a technique used to dynamically update the behavior of a piece of code at run-time. A monkey patch (also spelled monkey-patch, MonkeyPatch) is a way to extend or modify the runtime code of dynamic languages (e.g. Smalltalk, JavaScript, Objective-C, Ruby, Perl, Python, Groovy, etc.)

What is monkey patching in Django?

Monkey patching refers to the dynamic (run-time) modification of a class or module. It is an advanced topic in Python and to understand it one must have clarity about functions and how functions are treated in Python.


1 Answers

A little late to the party, but nonetheless, here's the answer.

As user2357112 hinted in the comment above, modifying the dict won't suffice, since __getitme__ (and other double-underscore names) are mapped to their slot, and won't be updated without calling update_slot (which isn't exported, so that would be a little tricky).

Inspired by the above comment, here's a working example of making __setitem__ a no-op for specific lists:

# assuming v3.8 (tested on Windows x64 and Ubuntu x64)
# definition of PyTypeObject: https://github.com/python/cpython/blob/3.8/Include/cpython/object.h#L177
# no extensive testing was performed and I'll let other decide if this is a good idea or not, but it's possible

import ctypes

Py_TPFLAGS_HEAPTYPE = (1 << 9)

# calculate the offset of the tp_flags field
offset  = ctypes.sizeof(ctypes.c_ssize_t) * 1 # PyObject_VAR_HEAD.ob_base.ob_refcnt
offset += ctypes.sizeof(ctypes.c_void_p)  * 1 # PyObject_VAR_HEAD.ob_base.ob_type
offset += ctypes.sizeof(ctypes.c_ssize_t) * 1 # PyObject_VAR_HEAD.ob_size
offset += ctypes.sizeof(ctypes.c_void_p)  * 1 # tp_name
offset += ctypes.sizeof(ctypes.c_ssize_t) * 2 # tp_basicsize+tp_itemsize
offset += ctypes.sizeof(ctypes.c_void_p)  * 1 # tp_dealloc
offset += ctypes.sizeof(ctypes.c_ssize_t) * 1 # tp_vectorcall_offset
offset += ctypes.sizeof(ctypes.c_void_p)  * 7 # tp_getattr+tp_setattr+tp_as_async+tp_repr+tp_as_number+tp_as_sequence+tp_as_mapping
offset += ctypes.sizeof(ctypes.c_void_p)  * 6 # tp_hash+tp_call+tp_str+tp_getattro+tp_setattro+tp_as_buffer

tp_flags = ctypes.c_ulong.from_address(id(list) + offset)
assert(tp_flags.value == list.__flags__) # should be the same

lst1 = [1,2,3]
lst2 = [1,2,3]
dont_set_me = [lst1] # these lists cannot be set

# define new method
orig = list.__setitem__
def new_setitem(self, *args):
    if [_ for _ in dont_set_me if _ is self]: # check for identical object in list
        print('Nope')
    else:
        return orig(self, *args)

tp_flags.value |= Py_TPFLAGS_HEAPTYPE # add flag, to allow type_setattro to continue
list.__setitem__ = new_setitem # set method, this will already call PyType_Modified and update_slot
tp_flags.value &= (~Py_TPFLAGS_HEAPTYPE) # remove flag

print(lst1, lst2)       # > [1, 2, 3] [1, 2, 3]
lst1[0],lst2[0]='x','x' # > Nope
print(lst1, lst2)       # > [1, 2, 3] ['x', 2, 3]

Edit
See here why it's not supported to begin with. Mainly, as explained by Guido van Rossum:

This is prohibited intentionally to prevent accidental fatal changes to built-in types (fatal to parts of the code that you never though of). Also, it is done to prevent the changes to affect different interpreters residing in the address space, since built-in types (unlike user-defined classes) are shared between all such interpreters.

I also searched for all usages of Py_TPFLAGS_HEAPTYPE in cpython and they all seem to be related to GC or some validations.

So I guess if:

  • You don't change the types structure (I believe the above doesnt)
  • You're not using multiple interpreters in the same process
  • You remove the flag and immediately restore it in a single-threaded state
  • You don't really do anything that can affect GC when the flag is removed

You'll just be fine <generic disclaimer here>.

like image 138
Eli Finkel Avatar answered Oct 13 '22 01:10

Eli Finkel