Is there any reason <code>x == x</code> is not evaluated quickly? I was hoping <code>__eq__</code> would check if its two arguments are identical, and if so return True instantly. But it doesn't do it: <pre class="prettyprint"><code>s = set(range(100000000)) s == s # this doesn't short-circuit, so takes ~1 sec </code></pre> For built-ins, <code>x == x</code> always returns True I think? For user-defined classes, I guess someone could define <code>__eq__</code> that doesn't satisfy this property, but is there any reasonable use case for that? The reason I want <code>x == x</code> to be evaluated quickly is because it's a huge performance hit when memoizing functions with very large arguments: <pre class="prettyprint"><code>from functools import lru_cache @lru_cache() def f(s): return sum(s) large_obj = frozenset(range(50000000)) f(large_obj) # this takes >1 sec every time </code></pre> Note that the reason @lru_cache is repeatedly slow for large objects is not because it needs to calculate <code>__hash__</code> (this is only done once and is then hard-cached as pointed out by @jsbueno), but because the dictionary's hash table needs to execute <code>__eq__</code> every time to make sure it found the right object in the bucket (equality of hashes is obviously insufficient). UPDATE: It seems it's worth considering this question separately for three situations. 1) User-defined types (i.e., not built-in / standard library). As @donkopotamus pointed out, there are cases where <code>x == x</code> should not evaluate to True. For example, for <code>numpy.array</code> and <code>pandas.Series</code> types, the result is intentionally not convertible to boolean because it's unclear what the natural semantics should be (does False mean the container is empty, or does it mean all items in it are False?). But here, there's no need for python to do anything, since the users can always short-circuit <code>x == x</code> comparison themselves if it's appropriate: <pre class="prettyprint"><code>def __eq__(self, other): if self is other: return True # continue normal evaluation </code></pre> 2) Python built-in / standard library types. a) Non-containers. For all I know the short-circuit may already be implemented for this case - I can't tell since either way it's super fast. b) Containers (including <code>str</code>). As @Karl Knechtel commented, adding short-circuit may hurt total performance if the savings from short-circuit are outweighed by the extra overhead in cases where <code>self is not other</code>. While theoretically possible, even in that case the overhead is a small in relative terms (container comparison is never super-fast). And of course, in cases where short-circuit helps, the savings can be dramatic. BTW, it turns out that <code>str</code> does short-circuit: comparing huge identical strings is instant.

As you say, someone could quite easily define an <code>__eq__</code> that you personally don't happen to approve of ... for example, the Institute of Electrical and Electronics Engineers might be so foolish as to do that: <pre class="prettyprint"><code>>>> float("NaN") == float("NaN") False </code></pre> Another "unreasonable use case": <pre class="prettyprint"><code>>>> bool(numpy.ma.masked == numpy.ma.masked) False </code></pre> Or even: <pre class="prettyprint"><code>>>> numpy.arange(10) == numpy.arange(10) array([ True, True, True, True, True, True, True, True, True, True], dtype=bool) </code></pre> which has the audacity to not even be convertible to <code>bool</code>! So there is certainly practical scope for <code>x == x</code> to not automagically be short-circuited to be true. <h3>Going Off Course</h3> However the following is perhaps a good question: <blockquote> Why doesn't <code>set.__eq__</code> check for instance identity? </blockquote> Well, one might think ... because a set <code>S</code> might contain <code>NaN</code> and since <code>NaN</code> cannot equal itself then surely such a set <code>S</code> cannot equal itself? Investigating: <pre class="prettyprint"><code>>>> s = set([float("NaN")]) >>> s == s True </code></pre> Hmm, that's interesting, especially since: <pre class="prettyprint"><code>>>> {float("NaN")} == {float("NaN")} False </code></pre> This behaviour is due to Python's desire for sequences to be reflexive.

Slow equality evaluation for identical objects (x == x)

Tags:

python

python-3.x

python-internals

Is there any reason x == x is not evaluated quickly? I was hoping __eq__ would check if its two arguments are identical, and if so return True instantly. But it doesn't do it:

s = set(range(100000000))
s == s # this doesn't short-circuit, so takes ~1 sec

For built-ins, x == x always returns True I think? For user-defined classes, I guess someone could define __eq__ that doesn't satisfy this property, but is there any reasonable use case for that?

The reason I want x == x to be evaluated quickly is because it's a huge performance hit when memoizing functions with very large arguments:

from functools import lru_cache
@lru_cache()
def f(s):
    return sum(s)
large_obj = frozenset(range(50000000))
f(large_obj) # this takes >1 sec every time

Note that the reason @lru_cache is repeatedly slow for large objects is not because it needs to calculate __hash__ (this is only done once and is then hard-cached as pointed out by @jsbueno), but because the dictionary's hash table needs to execute __eq__ every time to make sure it found the right object in the bucket (equality of hashes is obviously insufficient).

UPDATE:

It seems it's worth considering this question separately for three situations.

1) User-defined types (i.e., not built-in / standard library).

As @donkopotamus pointed out, there are cases where x == x should not evaluate to True. For example, for numpy.array and pandas.Series types, the result is intentionally not convertible to boolean because it's unclear what the natural semantics should be (does False mean the container is empty, or does it mean all items in it are False?).

But here, there's no need for python to do anything, since the users can always short-circuit x == x comparison themselves if it's appropriate:

def __eq__(self, other):
  if self is other:
    return True
  # continue normal evaluation

2) Python built-in / standard library types.

a) Non-containers.

For all I know the short-circuit may already be implemented for this case - I can't tell since either way it's super fast.

b) Containers (including str).

As @Karl Knechtel commented, adding short-circuit may hurt total performance if the savings from short-circuit are outweighed by the extra overhead in cases where self is not other. While theoretically possible, even in that case the overhead is a small in relative terms (container comparison is never super-fast). And of course, in cases where short-circuit helps, the savings can be dramatic.

BTW, it turns out that str does short-circuit: comparing huge identical strings is instant.

212

asked Aug 05 '16 00:08

max

1 Answers

As you say, someone could quite easily define an __eq__ that you personally don't happen to approve of ... for example, the Institute of Electrical and Electronics Engineers might be so foolish as to do that:

>>> float("NaN") == float("NaN")
False

Another "unreasonable use case":

>>> bool(numpy.ma.masked == numpy.ma.masked)
False

Or even:

>>> numpy.arange(10) == numpy.arange(10)
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)

which has the audacity to not even be convertible to bool!

So there is certainly practical scope for x == x to not automagically be short-circuited to be true.

Going Off Course

However the following is perhaps a good question:

Why doesn't set.__eq__ check for instance identity?

Well, one might think ... because a set S might contain NaN and since NaN cannot equal itself then surely such a set S cannot equal itself? Investigating:

>>> s = set([float("NaN")])
>>> s == s
True

Hmm, that's interesting, especially since:

>>> {float("NaN")} == {float("NaN")}
False

This behaviour is due to Python's desire for sequences to be reflexive.

164

answered Nov 13 '22 15:11

donkopotamus

Related questions
                            
                                Django: running manage.py always aborts
                            
                                Is there Python Clang wrapper in the vein of pygccxml which wraps GCC-XML?
                            
                                A Viable Solution for Word Splitting Khmer?
                            
                                How do I tell django-nose where my tests are?
                            
                                Simple license protection for Python app
                            
                                Anomaly detection using Python [closed]
                            
                                sys.setswitchinterval in Python 3.2 and beyond
                            
                                Semantic Search in Python for hobbies + latest news
                            
                                UnicodeEncodeError: 'ascii' codec can't encode character [...]
                            
                                C Python: Running Python code within a context
                            
                                Python Popen().stdout.read() hang
                            
                                Streaming media files via DLNA/UPnP
                            
                                OpenCV and Numpy interacting badly
                            
                                sklearn selectKbest: which variables were chosen?
                            
                                How do I fix the deprecation warning that comes with pylab.pause?
                            
                                Python requests send certificate as string
                            
                                Where does my embedded python stdout go?
                            
                                Improve contour detection with OpenCV (Python)
                            
                                Can functions know if they are already multiprocessed in Python (joblib)
                            
                                Is there a clean way to suppress compiler warnings from Cython when using pyximport.install?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With