I'm trying to overload some methods of the string builtin. I know there is no really legitimate use-case for this, but the behavior still bugs me so I would like to get an explanation of what is happening here:
Using Python2, and the forbiddenfruit
module.
>>> from forbiddenfruit import curse
>>> curse(str, '__repr__', lambda self:'bar')
>>> 'foo'
'foo'
>>> 'foo'.__repr__()
'bar'
As you can see, the __repr__
function as been successfully overloaded, but isn't actually called when when we ask for a representation. Why is that?
Then, how would you do to get the expected behaviour:
>>> 'foo'
'bar'
There is no constraint about setting up a custom environment, if rebuilding python is what it takes, so be it, but I really don't know where to start, and I still hope there is a easier way :)
Python does not support function overloading as in other languages, and the functional parameters do not have a data type.
Overloading Built-in Functions Consider the len() built-in function which returns the number of objects in a sequence or collection. To use it with our custom-made object type, we need to implement overloading. To overload len(), we need to extend the functionality of __len__ method in a class structure.
Printing Your Objects Prettily Using str() Moreover, __str__() is the method that is used by Python when you call print() on your object. It is necessary that __str__() returns a str object, and we get a TypeError if the return type is non-string.
The @overload decorator allows you to define alternate implementations of a function, specialized by argument type(s). A function with the same name must already exist in the local namespace.
The first thing to note is that whatever forbiddenfruit
is doing, it's not affecting repr
at all. This isn't a special case for str
, it just doesn't work like that:
import forbiddenfruit
class X:
repr = None
repr(X())
#>>> '<X object at 0x7f907acf4c18>'
forbiddenfruit.curse(X, "__repr__", lambda self: "I am X")
repr(X())
#>>> '<X object at 0x7f907acf4c50>'
X().__repr__()
#>>> 'I am X'
X.__repr__ = X.__repr__
repr(X())
#>>> 'I am X'
I recently found a much simpler way of doing what forbiddenfruit
does thanks to a post by HYRY:
import gc
underlying_dict = gc.get_referents(str.__dict__)[0]
underlying_dict["__repr__"] = lambda self: print("I am a str!")
"hello".__repr__()
#>>> I am a str!
repr("hello")
#>>> "'hello'"
So we know, somewhat anticlimactically, that something else is going on.
Here's the source for builtin_repr
:
builtin_repr(PyModuleDef *module, PyObject *obj)
/*[clinic end generated code: output=988980120f39e2fa input=a2bca0f38a5a924d]*/
{
return PyObject_Repr(obj);
}
And for PyObject_Repr
(sections elided):
PyObject *
PyObject_Repr(PyObject *v)
{
PyObject *res;
res = (*v->ob_type->tp_repr)(v);
if (res == NULL)
return NULL;
}
The important point is that instead of looking up in a dict
, it looks up the "cached" tp_repr
attribute.
Here's what happens when you set the attribute with something like TYPE.__repr__ = new_repr
:
static int
type_setattro(PyTypeObject *type, PyObject *name, PyObject *value)
{
if (!(type->tp_flags & Py_TPFLAGS_HEAPTYPE)) {
PyErr_Format(
PyExc_TypeError,
"can't set attributes of built-in/extension type '%s'",
type->tp_name);
return -1;
}
if (PyObject_GenericSetAttr((PyObject *)type, name, value) < 0)
return -1;
return update_slot(type, name);
}
The first part is the thing preventing you from modifying built-in types. Then it sets the attribute generically (PyObject_GenericSetAttr
) and, crucially, updates the slots.
If you're interested in how that works, it's available here. The crucial points are:
It's not an exported function and
It modifies the PyTypeObject
instance itself
so replicating it would require hacking into the PyTypeObject
type itself.
If you want to do so, probably the easiest thing to try would be (temporarily?) setting type->tp_flags & Py_TPFLAGS_HEAPTYPE
on the str
class. This would allow setting the attribute normally. Of course, there are no guarantees this won't crash your interpreter.
This is not what I want to do (especially not through ctypes
) unless I really have to, so I offer you a shortcut.
You write:
Then, how would you do to get the expected behaviour:
>>> 'foo' 'bar'
This is actually quite easy using sys.displayhook
:
sys.displayhook
is called on the result of evaluating an expression entered in an interactive Python session. The display of these values can be customized by assigning another one-argument function tosys.displayhook
.
And here's an example:
import sys
old_displayhook = sys.displayhook
def displayhook(object):
if type(object) is str:
old_displayhook('bar')
else:
old_displayhook(object)
sys.displayhook = displayhook
And then... (!)
'foo'
#>>> 'bar'
123
#>>> 123
On the philosophical point of why repr
would be cached as so, first consider:
1 + 1
It would be a pain if this had to look-up __add__
in a dictionary before calling, CPython is slow as it is, so CPython decided to cache lookups to standard dunder (double underscore) methods. __repr__
is one of those, even if it is less common to need the lookup optimized. This is still useful to keep formatting ('%s'%s
) fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With