Once, after watching Mike Muller's performance optimization tutorial (I think this one), one thought started to live in my head: if performance matters, minimize accessing items in the loop by index, e. g. if you need to access x[1]
multiple times in a loop for x in l
- assign a variable to x[1]
and reuse it in the loop.
Now I have this synthetic example:
import timeit
SEQUENCE = zip(range(1000), range(1, 1001))
def no_unpacking():
return [item[0] + item[1] for item in SEQUENCE]
def unpacking():
return [a + b for a, b in SEQUENCE]
print timeit.Timer('no_unpacking()', 'from __main__ import no_unpacking').timeit(10000)
print timeit.Timer('unpacking()', 'from __main__ import unpacking').timeit(10000)
unpacking()
and no_unpacking()
functions return the same result. The implementation is different: unpacking()
unpacks the items into a
and b
in the loop; no_unpacking()
gets values by indexes.
For python27, it shows:
1.25280499458
0.946601867676
in other words, unpacking()
outperforms no_unpacking()
by ~25%.
The question is:
Bonus question:
pypy
- there is almost no difference between these 2 functions, performance-wise. Why is that?Thanks for any help.
To answer your questions we can check the bytecode generated by the two functions using the dis
module:
In [5]: def no_unpacking():
...: s = []
...: for item in SEQUENCE:
...: s.append(item[0] + item[1])
...: return s
...:
...:
...: def unpacking():
...: s = []
...: for a,b in SEQUENCE:
...: s.append(a+b)
...: return s
I have expanded the list-comprehension because in python3 it would be more cumbersome to check the interesting bytecodes. The code is equivalent so it doesn't really matter for our purpose.
The bytecode for the first function is:
In [6]: dis.dis(no_unpacking)
2 0 BUILD_LIST 0
3 STORE_FAST 0 (s)
3 6 SETUP_LOOP 39 (to 48)
9 LOAD_GLOBAL 0 (SEQUENCE)
12 GET_ITER
>> 13 FOR_ITER 31 (to 47)
16 STORE_FAST 1 (item)
4 19 LOAD_FAST 0 (s)
22 LOAD_ATTR 1 (append)
25 LOAD_FAST 1 (item)
28 LOAD_CONST 1 (0)
31 BINARY_SUBSCR
32 LOAD_FAST 1 (item)
35 LOAD_CONST 2 (1)
38 BINARY_SUBSCR
39 BINARY_ADD
40 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
43 POP_TOP
44 JUMP_ABSOLUTE 13
>> 47 POP_BLOCK
5 >> 48 LOAD_FAST 0 (s)
51 RETURN_VALUE
Note that the loop has to call BINARY_SUBSCR
twice to access the two elements of the tuple.
The bytecode for the second function is:
In [7]: dis.dis(unpacking)
9 0 BUILD_LIST 0
3 STORE_FAST 0 (s)
10 6 SETUP_LOOP 37 (to 46)
9 LOAD_GLOBAL 0 (SEQUENCE)
12 GET_ITER
>> 13 FOR_ITER 29 (to 45)
16 UNPACK_SEQUENCE 2
19 STORE_FAST 1 (a)
22 STORE_FAST 2 (b)
11 25 LOAD_FAST 0 (s)
28 LOAD_ATTR 1 (append)
31 LOAD_FAST 1 (a)
34 LOAD_FAST 2 (b)
37 BINARY_ADD
38 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
41 POP_TOP
42 JUMP_ABSOLUTE 13
>> 45 POP_BLOCK
12 >> 46 LOAD_FAST 0 (s)
49 RETURN_VALUE
Note how there is no BINARY_SUBSCR
to perform.
So, it seems like UNPACK_SEQUENCE
plus one STORE_FAST
(which is the extra operations added by the unpacking) are faster then doing two BINARY_SUBSCR
.
This is reasonable since BINARY_SUBSCR
is a full-blown method call, while UNPACK_SEQUENCE
and STORE_FAST
are simpler operations.
You can see the difference even in simpler cases:
In [1]: def iter_with_index(s):
...: for i in range(len(s)):
...: s[i]
...:
In [2]: def iter_without_index(s):
...: for el in s:el
...:
In [3]: %%timeit s = 'a' * 10000
...: iter_with_index(s)
...:
1000 loops, best of 3: 583 us per loop
In [4]: %%timeit s = 'a' * 10000
...: iter_without_index(s)
...:
1000 loops, best of 3: 206 us per loop
As you can see iterating over a string is about 3 times slower using explicit indexing.
That's all overhead due to calls to BINARY_SUBSCR
.
Regarding your second question: pypy has JIT that is able to analyze the code and produces an optimized version that avoids the overhead of the indexing operations.
When it realizes that the subscription is done on tuples it's probably able to produce code that doesn't call the tuple method but accesses the elements directly, thus completely removing the BINARY_SUBSCR
operations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With