Why is early return slower than else?

Question

This is a pure guess, and I haven't figured out an easy way to check whether it is right, but I have a theory for you.

I tried your code and get the same of results, without_else() is repeatedly slightly slower than with_else():

>>> T(lambda : without_else()).repeat()
[0.42015745017874906, 0.3188967452567226, 0.31984281521812363]
>>> T(lambda : with_else()).repeat()
[0.36009842032996175, 0.28962249392031936, 0.2927151355828528]
>>> T(lambda : without_else(True)).repeat()
[0.31709728471076915, 0.3172671387005721, 0.3285821242644147]
>>> T(lambda : with_else(True)).repeat()
[0.30939889008243426, 0.3035132258429485, 0.3046679117038593]

Considering that the bytecode is identical, the only difference is the name of the function. In particular the timing test does a lookup on the global name. Try renaming without_else() and the difference disappears:

>>> def no_else(param=False):
    if param:
        return 1
    return 0

>>> T(lambda : no_else()).repeat()
[0.3359846013948413, 0.29025818923918223, 0.2921801513879245]
>>> T(lambda : no_else(True)).repeat()
[0.3810395594970828, 0.2969634408842694, 0.2960104566362247]

My guess is that without_else has a hash collision with something else in globals() so the global name lookup is slightly slower.

Edit: A dictionary with 7 or 8 keys probably has 32 slots, so on that basis without_else has a hash collision with __builtins__:

>>> [(k, hash(k) % 32) for k in globals().keys() ]
[('__builtins__', 8), ('with_else', 9), ('__package__', 15), ('without_else', 8), ('T', 21), ('__name__', 25), ('no_else', 28), ('__doc__', 29)]

To clarify how the hashing works:

__builtins__ hashes to -1196389688 which reduced modulo the table size (32) means it is stored in the #8 slot of the table.

without_else hashes to 505688136 which reduced modulo 32 is 8 so there's a collision. To resolve this Python calculates:

Starting with:

j = hash % 32
perturb = hash

Repeat this until we find a free slot:

j = (5*j) + 1 + perturb;
perturb >>= 5;
use j % 2**i as the next table index;

which gives it 17 to use as the next index. Fortunately that's free so the loop only repeats once. The hash table size is a power of 2, so 2**i is the size of the hash table, i is the number of bits used from the hash value j.

Each probe into the table can find one of these:

The slot is empty, in that case the probing stops and we know the value is not in the table.
The slot is unused but was used in the past in which case we go try the next value calculated as above.
The slot is full but the full hash value stored in the table isn't the same as the hash of the key we are looking for (that's what happens in the case of __builtins__ vs without_else).
The slot is full and has exactly the hash value we want, then Python checks to see if the key and the object we are looking up are the same object (which in this case they will be because short strings that could be identifiers are interned so identical identifiers use the exact same string).
Finally when the slot is full, the hash matches exactly, but the keys are not the identical object, then and only then will Python try comparing them for equality. This is comparatively slow, but in the case of name lookups shouldn't actually happen.

Why is early return slower than else?

Tags:

python

optimization

python-2.7

Recent Activity

Donate For Us

Why is early return slower than else?

Tags:

python

optimization

python-2.7

Related questions

Recent Activity

Donate For Us