Is there any reason x == x
is not evaluated quickly? I was hoping __eq__
would check if its two arguments are identical, and if so return True instantly. But it doesn't do it:
s = set(range(100000000))
s == s # this doesn't short-circuit, so takes ~1 sec
For built-ins, x == x
always returns True I think? For user-defined classes, I guess someone could define __eq__
that doesn't satisfy this property, but is there any reasonable use case for that?
The reason I want x == x
to be evaluated quickly is because it's a huge performance hit when memoizing functions with very large arguments:
from functools import lru_cache
@lru_cache()
def f(s):
return sum(s)
large_obj = frozenset(range(50000000))
f(large_obj) # this takes >1 sec every time
Note that the reason @lru_cache is repeatedly slow for large objects is not because it needs to calculate __hash__
(this is only done once and is then hard-cached as pointed out by @jsbueno), but because the dictionary's hash table needs to execute __eq__
every time to make sure it found the right object in the bucket (equality of hashes is obviously insufficient).
UPDATE:
It seems it's worth considering this question separately for three situations.
1) User-defined types (i.e., not built-in / standard library).
As @donkopotamus pointed out, there are cases where x == x
should not evaluate to True. For example, for numpy.array
and pandas.Series
types, the result is intentionally not convertible to boolean because it's unclear what the natural semantics should be (does False mean the container is empty, or does it mean all items in it are False?).
But here, there's no need for python to do anything, since the users can always short-circuit x == x
comparison themselves if it's appropriate:
def __eq__(self, other):
if self is other:
return True
# continue normal evaluation
2) Python built-in / standard library types.
a) Non-containers.
For all I know the short-circuit may already be implemented for this case - I can't tell since either way it's super fast.
b) Containers (including str
).
As @Karl Knechtel commented, adding short-circuit may hurt total performance if the savings from short-circuit are outweighed by the extra overhead in cases where self is not other
. While theoretically possible, even in that case the overhead is a small in relative terms (container comparison is never super-fast). And of course, in cases where short-circuit helps, the savings can be dramatic.
BTW, it turns out that str
does short-circuit: comparing huge identical strings is instant.
== is for value equality. It's used to know if two objects have the same value. is is for reference equality. It's used to know if two references refer (or point) to the same object, i.e if they're identical.
Use the equality operators == and != if you want to check whether or not two objects have the same value, regardless of where they're stored in memory. In the vast majority of cases, this is what you want to do.
In Python != is defined as not equal to operator. It returns True if operands on either side are not equal to each other, and returns False if they are equal. Whereas is not operator checks whether id() of two objects is same or not.
As you say, someone could quite easily define an __eq__
that you personally don't happen to approve of ... for example, the Institute of Electrical and Electronics Engineers might be so foolish as to do that:
>>> float("NaN") == float("NaN")
False
Another "unreasonable use case":
>>> bool(numpy.ma.masked == numpy.ma.masked)
False
Or even:
>>> numpy.arange(10) == numpy.arange(10)
array([ True, True, True, True, True, True, True, True, True, True], dtype=bool)
which has the audacity to not even be convertible to bool
!
So there is certainly practical scope for x == x
to not automagically be short-circuited to be true.
However the following is perhaps a good question:
Why doesn't
set.__eq__
check for instance identity?
Well, one might think ... because a set S
might contain NaN
and since NaN
cannot equal itself then surely such a set S
cannot equal itself? Investigating:
>>> s = set([float("NaN")])
>>> s == s
True
Hmm, that's interesting, especially since:
>>> {float("NaN")} == {float("NaN")}
False
This behaviour is due to Python's desire for sequences to be reflexive.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With