Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow equality evaluation for identical objects (x == x)

Is there any reason x == x is not evaluated quickly? I was hoping __eq__ would check if its two arguments are identical, and if so return True instantly. But it doesn't do it:

s = set(range(100000000))
s == s # this doesn't short-circuit, so takes ~1 sec

For built-ins, x == x always returns True I think? For user-defined classes, I guess someone could define __eq__ that doesn't satisfy this property, but is there any reasonable use case for that?

The reason I want x == x to be evaluated quickly is because it's a huge performance hit when memoizing functions with very large arguments:

from functools import lru_cache
@lru_cache()
def f(s):
    return sum(s)
large_obj = frozenset(range(50000000))
f(large_obj) # this takes >1 sec every time

Note that the reason @lru_cache is repeatedly slow for large objects is not because it needs to calculate __hash__ (this is only done once and is then hard-cached as pointed out by @jsbueno), but because the dictionary's hash table needs to execute __eq__ every time to make sure it found the right object in the bucket (equality of hashes is obviously insufficient).

UPDATE:

It seems it's worth considering this question separately for three situations.

1) User-defined types (i.e., not built-in / standard library).

As @donkopotamus pointed out, there are cases where x == x should not evaluate to True. For example, for numpy.array and pandas.Series types, the result is intentionally not convertible to boolean because it's unclear what the natural semantics should be (does False mean the container is empty, or does it mean all items in it are False?).

But here, there's no need for python to do anything, since the users can always short-circuit x == x comparison themselves if it's appropriate:

def __eq__(self, other):
  if self is other:
    return True
  # continue normal evaluation

2) Python built-in / standard library types.

a) Non-containers.

For all I know the short-circuit may already be implemented for this case - I can't tell since either way it's super fast.

b) Containers (including str).

As @Karl Knechtel commented, adding short-circuit may hurt total performance if the savings from short-circuit are outweighed by the extra overhead in cases where self is not other. While theoretically possible, even in that case the overhead is a small in relative terms (container comparison is never super-fast). And of course, in cases where short-circuit helps, the savings can be dramatic.

BTW, it turns out that str does short-circuit: comparing huge identical strings is instant.

like image 212
max Avatar asked Aug 05 '16 00:08

max


People also ask

Is there a difference between == and is?

== is for value equality. It's used to know if two objects have the same value. is is for reference equality. It's used to know if two references refer (or point) to the same object, i.e if they're identical.

How do you compare two objects and check whether they have same memory locations?

Use the equality operators == and != if you want to check whether or not two objects have the same value, regardless of where they're stored in memory. In the vast majority of cases, this is what you want to do.

Is not VS != Python?

In Python != is defined as not equal to operator. It returns True if operands on either side are not equal to each other, and returns False if they are equal. Whereas is not operator checks whether id() of two objects is same or not.


1 Answers

As you say, someone could quite easily define an __eq__ that you personally don't happen to approve of ... for example, the Institute of Electrical and Electronics Engineers might be so foolish as to do that:

>>> float("NaN") == float("NaN")
False

Another "unreasonable use case":

>>> bool(numpy.ma.masked == numpy.ma.masked)
False

Or even:

>>> numpy.arange(10) == numpy.arange(10)
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)

which has the audacity to not even be convertible to bool!

So there is certainly practical scope for x == x to not automagically be short-circuited to be true.

Going Off Course

However the following is perhaps a good question:

Why doesn't set.__eq__ check for instance identity?

Well, one might think ... because a set S might contain NaN and since NaN cannot equal itself then surely such a set S cannot equal itself? Investigating:

>>> s = set([float("NaN")])
>>> s == s
True

Hmm, that's interesting, especially since:

>>> {float("NaN")} == {float("NaN")}
False

This behaviour is due to Python's desire for sequences to be reflexive.

like image 164
donkopotamus Avatar answered Nov 13 '22 15:11

donkopotamus