What follows is four functions that have the same output, but either written with a list comprehension or a tight loop, and a function call to vs an inline condition.
Interestingly, a
and b
have the same bytecode when disassembled, however b
is much faster than a
.
Moreover, d
, which uses a tight loop with no function call, is faster than a
which uses a list comprehension with a function call.
Why do functions a and b have the same bytecode, and why does b perform much better than a given the same bytecode?
import dis
def my_filter(n):
return n < 5
def a():
# list comprehension with function call
return [i for i in range(10) if my_filter(i)]
def b():
# list comprehension without function call
return [i for i in range(10) if i < 5]
def c():
# tight loop with function call
values = []
for i in range(10):
if my_filter(i):
values.append(i)
return values
def d():
# tight loop without function call
values = []
for i in range(10):
if i < 5:
values.append(i)
return values
assert a() == b() == c() == d()
import sys
>>> sys.version_info[:]
(3, 6, 5, 'final', 0)
# list comprehension with function call
>>> dis.dis(a)
2 0 LOAD_CONST 1 (<code object <listcomp> at 0x00000211CBE8B300, file "<stdin>", line 2>)
2 LOAD_CONST 2 ('a.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_GLOBAL 0 (range)
8 LOAD_CONST 3 (10)
10 CALL_FUNCTION 1
12 GET_ITER
14 CALL_FUNCTION 1
16 RETURN_VALUE
# list comprehension without function call
>>> dis.dis(b)
2 0 LOAD_CONST 1 (<code object <listcomp> at 0x00000211CBB64270, file "<stdin>", line 2>)
2 LOAD_CONST 2 ('b.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_GLOBAL 0 (range)
8 LOAD_CONST 3 (10)
10 CALL_FUNCTION 1
12 GET_ITER
14 CALL_FUNCTION 1
16 RETURN_VALUE
# a and b have the same byte code?
# Why doesn't a have a LOAD_GLOBAL (my_filter) and CALL_FUNCTION?
# c below has both of these
# tight loop with function call
>>> dis.dis(c)
2 0 BUILD_LIST 0
2 STORE_FAST 0 (values)
3 4 SETUP_LOOP 34 (to 40)
6 LOAD_GLOBAL 0 (range)
8 LOAD_CONST 1 (10)
10 CALL_FUNCTION 1
12 GET_ITER
>> 14 FOR_ITER 22 (to 38)
16 STORE_FAST 1 (i)
4 18 LOAD_GLOBAL 1 (my_filter)
20 LOAD_FAST 1 (i)
22 CALL_FUNCTION 1
24 POP_JUMP_IF_FALSE 14
5 26 LOAD_FAST 0 (values)
28 LOAD_ATTR 2 (append)
30 LOAD_FAST 1 (i)
32 CALL_FUNCTION 1
34 POP_TOP
36 JUMP_ABSOLUTE 14
>> 38 POP_BLOCK
6 >> 40 LOAD_FAST 0 (values)
42 RETURN_VALUE
# tight loop without function call
>>> dis.dis(d)
2 0 BUILD_LIST 0
2 STORE_FAST 0 (values)
3 4 SETUP_LOOP 34 (to 40)
6 LOAD_GLOBAL 0 (range)
8 LOAD_CONST 1 (10)
10 CALL_FUNCTION 1
12 GET_ITER
>> 14 FOR_ITER 22 (to 38)
16 STORE_FAST 1 (i)
4 18 LOAD_FAST 1 (i)
20 LOAD_CONST 2 (5)
22 COMPARE_OP 0 (<)
24 POP_JUMP_IF_FALSE 14
5 26 LOAD_FAST 0 (values)
28 LOAD_ATTR 1 (append)
30 LOAD_FAST 1 (i)
32 CALL_FUNCTION 1
34 POP_TOP
36 JUMP_ABSOLUTE 14
>> 38 POP_BLOCK
6 >> 40 LOAD_FAST 0 (values)
42 RETURN_VALUE
import timeit
>>> timeit.timeit(a) # list comprehension with my_filter
1.2435139456834463
>>> timeit.timeit(b) # list comprehension without my_filter
0.6717423789164627
>>> timeit.timeit(c) # no list comprehension with my_filter
1.326850592144865
>>> timeit.timeit(d) # no list comprehension no my_filter
0.7743895521070954
Why do a
and b
have the same byte code when disassembled? I would have expected b
to have better looking bytecode. Notably I would have thought that a
would need a LOAD_GLOBAL ? (my_filter)
and a CALL FUNCTION
. For example, c
is the same as a
but without the list comprehension, and it uses these bytecodes these on addresses 18 and 22.
However, even with the same bytecode, b
performs much better than a
. What's going on here?
Even more interesting, d
, which uses a tight loop but does not have the call to my_filter
, is faster than b
which uses the list comprehension but has the call to my_filter
. It looks like the overhead of using a function outweighs the overhead of a tight loop.
My goal here is to try to figure out if I can factor out conditions of a list comprehension into a function to make the list comprehension easier to read.
The disassembler converts the byte-compiled code into human-readable form. The byte-code interpreter is implemented as a simple stack machine. It pushes values onto a stack of its own, then pops them off to use them in calculations whose results are themselves pushed back on the stack.
Source code: Lib/dis.py. The dis module supports the analysis of CPython bytecode by disassembling it. The CPython bytecode which this module takes as an input is defined in the file Include/opcode. h and used by the compiler and the interpreter.
Disassemble Your Python CodeWe can disassemble our code using Python's dis module. This is what disassembling a simple function looks like: We imported the dis module and then called the dis() method to disassemble my_very_special_function . Remember not to include round braces.
The bytecode is a low-level platform-independent representation of your source code, however, it is not the binary machine code and cannot be run by the target machine directly. In fact, it is a set of instructions for a virtual machine which is called the Python Virtual Machine (PVM).
Note that both bytecodes for a
and b
only run <listcomp>
objects defined elsewhere.
2 0 LOAD_CONST 1 (<code object <listcomp> at 0x00000211CBE8B300, file "<stdin>", line 2>)
Since the wrapper functions a
and b
are identical, their bytecodes are the same, only the addresses of listcomps are different.
In python 3.7 the dis module also prints the listcomps, here's the complete code and the output:
import sys
import dis
def my_filter(n):
return n < 5
def a():
# list comprehension with function call
return [i for i in range(10) if my_filter(i)]
def b():
# list comprehension without function call
return [i for i in range(10) if i < 5]
print(sys.version)
print('-' * 70)
dis.dis(a)
print('-' * 70)
dis.dis(b)
--
3.7.3 (default, May 19 2019, 21:16:26)
[Clang 10.0.1 (clang-1001.0.46.4)]
----------------------------------------------------------------------
9 0 LOAD_CONST 1 (<code object <listcomp> at 0x1065c61e0, file "/w/test/x.py", line 9>)
2 LOAD_CONST 2 ('a.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_GLOBAL 0 (range)
8 LOAD_CONST 3 (10)
10 CALL_FUNCTION 1
12 GET_ITER
14 CALL_FUNCTION 1
16 RETURN_VALUE
Disassembly of <code object <listcomp> at 0x1065c61e0, file "/w/test/x.py", line 9>:
9 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_GLOBAL 0 (my_filter)
10 LOAD_FAST 1 (i)
12 CALL_FUNCTION 1
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
----------------------------------------------------------------------
13 0 LOAD_CONST 1 (<code object <listcomp> at 0x1066188a0, file "/w/test/x.py", line 13>)
2 LOAD_CONST 2 ('b.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_GLOBAL 0 (range)
8 LOAD_CONST 3 (10)
10 CALL_FUNCTION 1
12 GET_ITER
14 CALL_FUNCTION 1
16 RETURN_VALUE
Disassembly of <code object <listcomp> at 0x1066188a0, file "/w/test/x.py", line 13>:
13 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
For pythons < 3.7. see Python: analyze a list comprehension with dis
List-Comprehensions are converted to inner functions, because they built a separate namespace. The inner functions for the LC in a
and b
differ:
>>> dis.dis(a.__code__.co_consts[1])
3 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_GLOBAL 0 (my_filter)
10 LOAD_FAST 1 (i)
12 CALL_FUNCTION 1
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
>>> dis.dis(b.__code__.co_consts[1])
3 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
There you see the function call in a
and the comparision in b
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With