Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does the `is` operator use a __magic__ method in Python?

The is operator is used test for identity.

I was wondering if the is operator and id() function call any __magic__ method, the way == calls __eq__.

I had some fun checking out __hash__:

class Foo(object):
    def __hash__(self):
        return random.randint(0, 2 ** 32)

a = Foo()
b = {}
for i in range(5000):
    b[a] = i

Think about dict b and the value of b[a]

Every subsequent lookup of d[a] is either a KeyError or a random integer.

But as the docs on the special methods state

[the default implementation of] x.__hash__() returns id(x).

So there is relation between the two, but just the other way around.

I've seen many questions on is and id here, and the answers have helped many confused minds, but I couldn't find an answer to this one.

like image 563
Chris Wesseling Avatar asked Mar 14 '13 00:03

Chris Wesseling


People also ask

What are the magic methods in Python?

Dunder or magic methods in Python are the methods having two prefix and suffix underscores in the method name. Dunder here means “Double Under (Underscores)”. These are commonly used for operator overloading. Few examples for magic methods are: __init__, __add__, __len__, __repr__ etc.

Is magic method for greater than operator?

To use the greater than operator on custom objects, define the __gt__() “dunder” magic method that takes two arguments: self and other . You can then use attributes of the custom objects to determine if one is greater than the other.

What are the __ methods in Python?

The __str__ method in Python represents the class objects as a string – it can be used for classes. The __str__ method should be defined in a way that is easy to read and outputs all the members of the class. This method is also used as a debugging tool when the members of a class need to be checked.

How does the __ GE__ operator works in Python?

Python internally calls x. __ge__(y) to obtain a return value when comparing two objects using x >= y . The return value can be any data type because any value can automatically converted to a Boolean by using the bool() built-in function.


2 Answers

No, is is a straight pointer comparison, and id just returns the address of the object cast to a long.

From ceval.c:

case PyCmp_IS:
    res = (v == w);
    break;
case PyCmp_IS_NOT:
    res = (v != w);
    break;

v and w here are simply PyObject *.

From bltinmodule.c:

static PyObject *
builtin_id(PyObject *self, PyObject *v)
{
    return PyLong_FromVoidPtr(v);
}

PyDoc_STRVAR(id_doc,
"id(object) -> integer\n\
\n\
Return the identity of an object. This is guaranteed to be unique among\n\
simultaneously existing objects. (Hint: it's the object's memory address.)");
like image 103
nneonneo Avatar answered Oct 21 '22 21:10

nneonneo


The short answer is: No, they do not. As the docs that you link to say:

The operators is and is not test for object identity: x is y is true if and only if x and y are the same object.

Being "the same object" is not something you're allowed to override. If your object is not the same object as another, it cannot pretend to be.


So… Why? What would be the harm of letting you override is and/or id? Obviously it would almost always be a stupid thing to do, but Python lets you do a lot of stupid things if you try hard enough.

The design FAQ and similar documents don't say. But I suspect it's primarily because it makes it easier to debug Python and some of the deeper standard library modules, knowing there is some way, from within the interpreter, to verify that two names really do refer to the same object, or to print out the id to make sure a name hasn't changed over time, etc. Imagine debugging weakref, or even pickle, without that.


So, what exactly does "same object" mean? Well, that's up to the interpreter. Obviously it has to be impossible to distinguish two instances of the same object at the language level, and probably at the interpreter level as well (especially since there's a well-defined API for plugging into most interpreter implementations).

All of the major implementations handle this by deferring to the notion of identity at the lower level. CPython compares the values of the PyObject* pointers, Jython identity-compares the Java references, PyPy does an is on the objectspace objects…

It's worth looking at the PyPy source, which requires the "x is y iff x and y are the same object" to be true in both directions. The top-level expression x is y is true iff, whatever objects wx and wy in the appropriate objectspace are, wy.is_(wx) is true, and is_ is implemented as wy is wx. So, x is y at level N iff y is x at level N-1.


Notice that this means you could pretty easily use PyPy to build a dialect of Python where is can be overridden, just by attaching is_ to a dunder method __is__ at the higher level. But there's a simpler way to do the same thing:

def is_(x, y):
    if hasattr(x, '__is__'):
        return x.__is__(y)
    elif hasattr(y, '__is__'):
        return y.__is__(x)
    else:
        return x is y

Now play with is_(x, y) instead of x is y, and see if you can find any fun trouble before doing the hard work of modifying the interpreter (even if it isn't that hard, in this case).


So, what does is have to do with id? Could is be implemented on top of id—e.g., x is y just checks id(x) == id(y)? Well, id:

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

So, the id of an object is unique and constant during its lifetime, and x is y is true iff they're the same object, therefore x is y is true iff id(x) == id(y), right?

Well, id can be rebound to whatever you want, and that isn't allowed to affect is. If you crafted the definition very carefully (keep in mind that if you discard the builtins reference to id, whatever implementation used to be there isn't even guaranteed to exist anymore, or to work correctly if it does exist…), you could define is on top of the default implementation of id.

But it would be an odd thing to do. In CPython, where id(x) just "returns the address of the object in memory", which is the same thing as the value of the pointer to the object in memory. But that's just an artifact of CPython; there's nothing saying other implementations have to make id return the underlying value used for identity comparison as an integer. In fact, it's not clear how you'd even do that in an implementation written in a language without pointers (that can be cast to integers). In PyPy, the id of an object may even be a value computed the first time it's accessed and stashed in a dictionary in the objectspace, keyed by the object itself.


As for __hash__, you're misreading an important part of the docs.

[...] x.__hash__() returns id(x).

The part you ellipsized makes it clear that this is only true for instances of user-defined classes (that don't redefine __hash__). It's obviously not true for, e.g., tuple. In short, identity has nothing to do with hashing, except that for some objects the identity makes a convenient hash value.

like image 21
abarnert Avatar answered Oct 21 '22 21:10

abarnert