As I read Python answers on Stack Overflow, I continue to see some people telling users to use the data model's special methods or attributes directly.
I then see contradicting advice (sometimes from myself) saying not to do that, and instead to use builtin functions and the operators directly.
Why is that? What is the relationship between the special "dunder" methods and attributes of the Python data model and builtin functions?
When am I supposed to use the special names?
Difference between Python Methods vs FunctionsMethods are associated with the objects of the class they belong to. Functions are not associated with any object. We can invoke a function just by its name. Functions operate on the data you pass to them as arguments.
The Python Data Model is a document in the official Python documentation that describes the Python language's concept of data, as opposed to how other languages treat data. It's full of abstract ideas and generally illegible to new software developers.
A model is a Python class that inherits from the Model class. The model class defines a new Kind of datastore entity and the properties the Kind is expected to take. The Kind name is defined by the instantiated class name that inherits from db. Model .
Thus, you should prefer to use the builtin functions and operators where possible over the special methods and attributes of the datamodel.
The semantically internal APIs are more likely to change than the public interfaces. While Python doesn't actually consider anything "private" and exposes the internals, that doesn't mean it's a good idea to abuse that access. Doing so has the following risks:
The builtin functions and operators invoke the special methods and use the special attributes in the Python datamodel. They are the readable and maintainable veneer that hides the internals of objects. In general, users should use the builtins and operators given in the language as opposed to calling the special methods or using the special attributes directly.
The builtin functions and operators also can have fallback or more elegant behavior than the more primitive datamodel special methods. For example:
next(obj, default)
allows you to provide a default instead of raising StopIteration
when an iterator runs out, while obj.__next__()
does not. str(obj)
fallsback to obj.__repr__()
when obj.__str__()
isn't available - whereas calling obj.__str__()
directly would raise an attribute error.obj != other
fallsback to not obj == other
in Python 3 when no __ne__
- calling obj.__ne__(other)
would not take advantage of this.(Builtin functions can also be easily overshadowed, if necessary or desirable, on a module's global scope or the builtins
module, to further customize behavior.)
Here is a mapping, with notes, of the builtin functions and operators to the respective special methods and attributes that they use or return - note that the usual rule is that the builtin function usually maps to a special method of the same name, but this is not consistent enough to warrant giving this map below:
builtins/ special methods/ operators -> datamodel NOTES (fb == fallback) repr(obj) obj.__repr__() provides fb behavior for str str(obj) obj.__str__() fb to __repr__ if no __str__ bytes(obj) obj.__bytes__() Python 3 only unicode(obj) obj.__unicode__() Python 2 only format(obj) obj.__format__() format spec optional. hash(obj) obj.__hash__() bool(obj) obj.__bool__() Python 3, fb to __len__ bool(obj) obj.__nonzero__() Python 2, fb to __len__ dir(obj) obj.__dir__() vars(obj) obj.__dict__ does not include __slots__ type(obj) obj.__class__ type actually bypasses __class__ - overriding __class__ will not affect type help(obj) obj.__doc__ help uses more than just __doc__ len(obj) obj.__len__() provides fb behavior for bool iter(obj) obj.__iter__() fb to __getitem__ w/ indexes from 0 on next(obj) obj.__next__() Python 3 next(obj) obj.next() Python 2 reversed(obj) obj.__reversed__() fb to __len__ and __getitem__ other in obj obj.__contains__(other) fb to __iter__ then __getitem__ obj == other obj.__eq__(other) obj != other obj.__ne__(other) fb to not obj.__eq__(other) in Python 3 obj < other obj.__lt__(other) get >, >=, <= with @functools.total_ordering complex(obj) obj.__complex__() int(obj) obj.__int__() float(obj) obj.__float__() round(obj) obj.__round__() abs(obj) obj.__abs__()
The operator
module has length_hint
which has a fallback implemented by a respective special method if __len__
is not implemented:
length_hint(obj) obj.__length_hint__()
Dotted lookups are contextual. Without special method implementation, first look in class hierarchy for data descriptors (like properties and slots), then in the instance __dict__
(for instance variables), then in the class hierarchy for non-data descriptors (like methods). Special methods implement the following behaviors:
obj.attr obj.__getattr__('attr') provides fb if dotted lookup fails obj.attr obj.__getattribute__('attr') preempts dotted lookup obj.attr = _ obj.__setattr__('attr', _) preempts dotted lookup del obj.attr obj.__delattr__('attr') preempts dotted lookup
Descriptors are a bit advanced - feel free to skip these entries and come back later - recall the descriptor instance is in the class hierarchy (like methods, slots, and properties). A data descriptor implements either __set__
or __delete__
:
obj.attr descriptor.__get__(obj, type(obj)) obj.attr = val descriptor.__set__(obj, val) del obj.attr descriptor.__delete__(obj)
When the class is instantiated (defined) the following descriptor method __set_name__
is called if any descriptor has it to inform the descriptor of its attribute name. (This is new in Python 3.6.) cls
is same as type(obj)
above, and 'attr'
stands in for the attribute name:
class cls: @descriptor_type def attr(self): pass # -> descriptor.__set_name__(cls, 'attr')
The subscript notation is also contextual:
obj[name] -> obj.__getitem__(name) obj[name] = item -> obj.__setitem__(name, item) del obj[name] -> obj.__delitem__(name)
A special case for subclasses of dict
, __missing__
is called if __getitem__
doesn't find the key:
obj[name] -> obj.__missing__(name)
There are also special methods for +, -, *, @, /, //, %, divmod(), pow(), **, <<, >>, &, ^, |
operators, for example:
obj + other -> obj.__add__(other), fallback to other.__radd__(obj) obj | other -> obj.__or__(other), fallback to other.__ror__(obj)
and in-place operators for augmented assignment, +=, -=, *=, @=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=
, for example:
obj += other -> obj.__iadd__(other) obj |= other -> obj.__ior__(other)
(If these in-place operators are not defined, Python falls back to, for example, for obj += other
to obj = obj + other
)
and unary operations:
+obj -> obj.__pos__() -obj -> obj.__neg__() ~obj -> obj.__invert__()
A context manager defines __enter__
, which is called on entering the code block (its return value, usually self, is aliased with as
), and __exit__
, which is guaranteed to be called on leaving the code block, with exception information.
with obj as enters_return_value: #-> enters_return_value = obj.__enter__() raise Exception('message') #-> obj.__exit__(Exception, #-> Exception('message'), #-> traceback_object)
If __exit__
gets an exception and then returns a false value, it will reraise it on leaving the method.
If no exception, __exit__
gets None
for those three arguments instead, and the return value is meaningless:
with obj: #-> obj.__enter__() pass #-> obj.__exit__(None, None, None)
Similarly, classes can have special methods (from their metaclasses) that support abstract base classes:
isinstance(obj, cls) -> cls.__instancecheck__(obj) issubclass(sub, cls) -> cls.__subclasscheck__(sub)
An important takeaway is that while the builtins like next
and bool
do not change between Python 2 and 3, underlying implementation names are changing.
Thus using the builtins also offers more forward compatibility.
In Python, names that begin with underscores are semantically non-public names for users. The underscore is the creator's way of saying, "hands-off, don't touch."
This is not just cultural, but it is also in Python's treatment of API's. When a package's __init__.py
uses import *
to provide an API from a subpackage, if the subpackage does not provide an __all__
, it excludes names that start with underscores. The subpackage's __name__
would also be excluded.
IDE autocompletion tools are mixed in their consideration of names that start with underscores to be non-public. However, I greatly appreciate not seeing __init__
, __new__
, __repr__
, __str__
, __eq__
, etc. (nor any of the user created non-public interfaces) when I type the name of an object and a period.
Thus I assert:
The special "dunder" methods are not a part of the public interface. Avoid using them directly.
So when to use them?
The main use-case is when implementing your own custom object or subclass of a builtin object.
Try to only use them when absolutely necessary. Here are some examples:
__name__
special attribute on functions or classesWhen we decorate a function, we typically get a wrapper function in return that hides helpful information about the function. We would use the @wraps(fn)
decorator to make sure we don't lose that information, but if we need the name of the function, we need to use the __name__
attribute directly:
from functools import wraps def decorate(fn): @wraps(fn) def decorated(*args, **kwargs): print('calling fn,', fn.__name__) # exception to the rule return fn(*args, **kwargs) return decorated
Similarly, I do the following when I need the name of the object's class in a method (used in, for example, a __repr__
):
def get_class_name(self): return type(self).__name__ # ^ # ^- must use __name__, no builtin e.g. name() # use type, not .__class__
When we want to define custom behavior, we must use the data-model names.
This makes sense, since we are the implementors, these attributes aren't private to us.
class Foo(object): # required to here to implement == for instances: def __eq__(self, other): # but we still use == for the values: return self.value == other.value # required to here to implement != for instances: def __ne__(self, other): # docs recommend for Python 2. # use the higher level of abstraction here: return not self == other
However, even in this case, we don't use self.value.__eq__(other.value)
or not self.__eq__(other)
(see my answer here for proof that the latter can lead to unexpected behavior.) Instead, we should use the higher level of abstraction.
Another point at which we'd need to use the special method names is when we are in a child's implementation, and want to delegate to the parent. For example:
class NoisyFoo(Foo): def __eq__(self, other): print('checking for equality') # required here to call the parent's method return super(NoisyFoo, self).__eq__(other)
The special methods allow users to implement the interface for object internals.
Use the builtin functions and operators wherever you can. Only use the special methods where there is no documented public API.
I'll show some usage that you apparently didn't think of, comment on the examples you showed, and argue against the privacy claim from your own answer.
I agree with your own answer that for example len(a)
should be used, not a.__len__()
. I'd put it like this: len
exists so we can use it, and __len__
exists so len
can use it. Or however that really works internally, since len(a)
can actually be much faster, at least for example for lists and strings:
>>> timeit('len(a)', 'a = [1,2,3]', number=10**8) 4.22549770486512 >>> timeit('a.__len__()', 'a = [1,2,3]', number=10**8) 7.957335462257106 >>> timeit('len(s)', 's = "abc"', number=10**8) 4.1480574509332655 >>> timeit('s.__len__()', 's = "abc"', number=10**8) 8.01780160432645
But besides defining these methods in my own classes for usage by builtin functions and operators, I occasionally also use them as follows:
Let's say I need to give a filter function to some function and I want to use a set s
as the filter. I'm not going to create an extra function lambda x: x in s
or def f(x): return x in s
. No. I already have a perfectly fine function that I can use: the set's __contains__
method. It's simpler and more direct. And even faster, as shown here (ignore that I save it as f
here, that's just for this timing demo):
>>> timeit('f(2); f(4)', 's = {1, 2, 3}; f = s.__contains__', number=10**8) 6.473739433621368 >>> timeit('f(2); f(4)', 's = {1, 2, 3}; f = lambda x: x in s', number=10**8) 19.940786514456924 >>> timeit('f(2); f(4)', 's = {1, 2, 3}\ndef f(x): return x in s', number=10**8) 20.445680107760325
So while I don't directly call magic methods like s.__contains__(x)
, I do occasionally pass them somewhere like some_function_needing_a_filter(s.__contains__)
. And I think that's perfectly fine, and better than the lambda/def alternative.
My thoughts on the examples you showed:
items.__len__()
. Even without any reasoning. My verdict: That's just wrong. Should be len(items)
.d[key] = value
first! And then adds d.__setitem__(key, value)
with the reasoning "if your keyboard is missing the square bracket keys", which rarely applies and which I doubt was serious. I think it was just the foot in the door for the last point, mentioning that that's how we can support the square bracket syntax in our own classes. Which turns it back to a suggestion to use square brackets.obj.__dict__
. Bad, like the __len__
example. But I suspect he just didn't know vars(obj)
, and I can understand it, as vars
is less common/known and the name does differ from the "dict" in __dict__
.__class__
. Should be type(obj)
. I suspect it's similar to the __dict__
story, although I think type
is more well-known.About privacy: In your own answer you say these methods are "semantically private". I strongly disagree. Single and double leading underscores are for that, but not the data model's special "dunder/magic" methods with double leading+trailing underscores.
_foo
and __bar__
and then autocompletion didn't offer _foo
but did offer __bar__
. And when I used both methods anyway, PyCharm only warned me about _foo
(calling it a "protected member"), not about __bar__
.Besides Andrew's article I also checked several more about these "magic"/"dunder" methods, and I found none of them talking about privacy at all. That's just not what this is about.
Again, we should use len(a)
, not a.__len__()
. But not because of privacy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With