The is operator is used test for identity.
I was wondering if the is
operator and id()
function call any __magic__
method, the way ==
calls __eq__
.
I had some fun checking out __hash__
:
class Foo(object):
def __hash__(self):
return random.randint(0, 2 ** 32)
a = Foo()
b = {}
for i in range(5000):
b[a] = i
Think about dict b
and the value of b[a]
Every subsequent lookup of
d[a]
is either aKeyError
or a random integer.
But as the docs on the special methods state
[the default implementation of] x.
__hash__
() returns id(x).
So there is relation between the two, but just the other way around.
I've seen many questions on is
and id
here, and the answers have helped many confused minds, but I couldn't find an answer to this one.
Dunder or magic methods in Python are the methods having two prefix and suffix underscores in the method name. Dunder here means “Double Under (Underscores)”. These are commonly used for operator overloading. Few examples for magic methods are: __init__, __add__, __len__, __repr__ etc.
To use the greater than operator on custom objects, define the __gt__() “dunder” magic method that takes two arguments: self and other . You can then use attributes of the custom objects to determine if one is greater than the other.
The __str__ method in Python represents the class objects as a string – it can be used for classes. The __str__ method should be defined in a way that is easy to read and outputs all the members of the class. This method is also used as a debugging tool when the members of a class need to be checked.
Python internally calls x. __ge__(y) to obtain a return value when comparing two objects using x >= y . The return value can be any data type because any value can automatically converted to a Boolean by using the bool() built-in function.
No, is
is a straight pointer comparison, and id
just returns the address of the object cast to a long
.
From ceval.c
:
case PyCmp_IS:
res = (v == w);
break;
case PyCmp_IS_NOT:
res = (v != w);
break;
v
and w
here are simply PyObject *
.
From bltinmodule.c
:
static PyObject *
builtin_id(PyObject *self, PyObject *v)
{
return PyLong_FromVoidPtr(v);
}
PyDoc_STRVAR(id_doc,
"id(object) -> integer\n\
\n\
Return the identity of an object. This is guaranteed to be unique among\n\
simultaneously existing objects. (Hint: it's the object's memory address.)");
The short answer is: No, they do not. As the docs that you link to say:
The operators
is
andis not
test for object identity:x is y
is true if and only ifx
andy
are the same object.
Being "the same object" is not something you're allowed to override. If your object is not the same object as another, it cannot pretend to be.
So… Why? What would be the harm of letting you override is
and/or id
? Obviously it would almost always be a stupid thing to do, but Python lets you do a lot of stupid things if you try hard enough.
The design FAQ and similar documents don't say. But I suspect it's primarily because it makes it easier to debug Python and some of the deeper standard library modules, knowing there is some way, from within the interpreter, to verify that two names really do refer to the same object, or to print out the id
to make sure a name hasn't changed over time, etc. Imagine debugging weakref
, or even pickle
, without that.
So, what exactly does "same object" mean? Well, that's up to the interpreter. Obviously it has to be impossible to distinguish two instances of the same object at the language level, and probably at the interpreter level as well (especially since there's a well-defined API for plugging into most interpreter implementations).
All of the major implementations handle this by deferring to the notion of identity at the lower level. CPython compares the values of the PyObject*
pointers, Jython identity-compares the Java references, PyPy does an is
on the objectspace objects…
It's worth looking at the PyPy source, which requires the "x is y
iff x
and y
are the same object" to be true in both directions. The top-level expression x is y
is true iff, whatever objects wx
and wy
in the appropriate objectspace are, wy.is_(wx)
is true, and is_
is implemented as wy is wx
. So, x is y
at level N iff y is x
at level N-1.
Notice that this means you could pretty easily use PyPy to build a dialect of Python where is
can be overridden, just by attaching is_
to a dunder method __is__
at the higher level. But there's a simpler way to do the same thing:
def is_(x, y):
if hasattr(x, '__is__'):
return x.__is__(y)
elif hasattr(y, '__is__'):
return y.__is__(x)
else:
return x is y
Now play with is_(x, y)
instead of x is y
, and see if you can find any fun trouble before doing the hard work of modifying the interpreter (even if it isn't that hard, in this case).
So, what does is
have to do with id
? Could is
be implemented on top of id
—e.g., x is y
just checks id(x) == id(y)
? Well, id
:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same
id()
value.
So, the id
of an object is unique and constant during its lifetime, and x is y
is true iff they're the same object, therefore x is y
is true iff id(x) == id(y)
, right?
Well, id
can be rebound to whatever you want, and that isn't allowed to affect is
. If you crafted the definition very carefully (keep in mind that if you discard the builtins
reference to id
, whatever implementation used to be there isn't even guaranteed to exist anymore, or to work correctly if it does exist…), you could define is
on top of the default implementation of id
.
But it would be an odd thing to do. In CPython, where id(x)
just "returns the address of the object in memory", which is the same thing as the value of the pointer to the object in memory. But that's just an artifact of CPython; there's nothing saying other implementations have to make id
return the underlying value used for identity comparison as an integer. In fact, it's not clear how you'd even do that in an implementation written in a language without pointers (that can be cast to integers). In PyPy, the id
of an object may even be a value computed the first time it's accessed and stashed in a dictionary in the objectspace, keyed by the object itself.
As for __hash__
, you're misreading an important part of the docs.
[...]
x.__hash__()
returnsid(x)
.
The part you ellipsized makes it clear that this is only true for instances of user-defined classes (that don't redefine __hash__
). It's obviously not true for, e.g., tuple
. In short, identity has nothing to do with hashing, except that for some objects the identity makes a convenient hash value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With