I don't come from a software/computer science background but I love to code in Python and can generally understand why things are faster. I am really curious to know why this for loop runs faster than the dictionary comprehension. Any insights? <blockquote> Problem : Given a dictionary <code>a</code> with these keys and values, return a dictionary with the values as keys and the keys as values. (challenge: do this in one line) </blockquote> and the code <pre class="prettyprint"><code>a = {'a':'hi','b':'hey','c':'yo'} b = {} for i,j in a.items(): b[j]=i %% timeit 932 ns ± 37.2 ns per loop b = {v: k for k, v in a.items()} %% timeit 1.08 µs ± 16.4 ns per loop </code></pre>

You are testing with way too small an input; while a dictionary comprehension doesn't have as much of a performance advantage against a <code>for</code> loop when compared to a list comprehension, for realistic problem sizes it can and does beat <code>for</code> loops, especially when targeting a global name. Your input consists of just 3 key-value pairs. Testing with 1000 elements instead, we see that the timings are very close instead: <pre class="prettyprint"><code>>>> import timeit >>> from random import choice, randint; from string import ascii_lowercase as letters >>> looped = '''\ ... b = {} ... for i,j in a.items(): ... b[j]=i ... ''' >>> dictcomp = '''b = {v: k for k, v in a.items()}''' >>> def rs(): return ''.join([choice(letters) for _ in range(randint(3, 15))]) ... >>> a = {rs(): rs() for _ in range(1000)} >>> len(a) 1000 >>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange() >>> (total / count) * 1000000 # microseconds per run 66.62004760000855 >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange() >>> (total / count) * 1000000 # microseconds per run 64.5464928005822 </code></pre> The difference is there, the dict comp is faster but only just at this scale. With 100 times as many key-value pairs the difference is a bit bigger: <pre class="prettyprint"><code>>>> a = {rs(): rs() for _ in range(100000)} >>> len(a) 98476 >>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange() >>> total / count * 1000 # milliseconds, different scale! 15.48140200029593 >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange() >>> total / count * 1000 # milliseconds, different scale! 13.674790799996117 </code></pre> which is not that big a difference when you consider both processed nearly 100k key-value pairs. Still, the <code>for</code> loop is clearly slower. So why the speed difference with 3 elements? Because a comprehension (dictionary, set, list comprehensions or a generator expression) is under the hood implemented as a new function, and calling that function has a base cost the plain loop doesn't have to pay. Here's the disassembly for the bytecode for both alternatives; note the <code>MAKE_FUNCTION</code> and <code>CALL_FUNCTION</code> opcodes in the top-level bytecode for the dict comprehension, there is a separate section for what that function then does, and there are actually very few differences in between the two approaches here: <pre class="prettyprint"><code>>>> import dis >>> dis.dis(looped) 1 0 BUILD_MAP 0 2 STORE_NAME 0 (b) 2 4 SETUP_LOOP 28 (to 34) 6 LOAD_NAME 1 (a) 8 LOAD_METHOD 2 (items) 10 CALL_METHOD 0 12 GET_ITER >> 14 FOR_ITER 16 (to 32) 16 UNPACK_SEQUENCE 2 18 STORE_NAME 3 (i) 20 STORE_NAME 4 (j) 3 22 LOAD_NAME 3 (i) 24 LOAD_NAME 0 (b) 26 LOAD_NAME 4 (j) 28 STORE_SUBSCR 30 JUMP_ABSOLUTE 14 >> 32 POP_BLOCK >> 34 LOAD_CONST 0 (None) 36 RETURN_VALUE >>> dis.dis(dictcomp) 1 0 LOAD_CONST 0 (<code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>) 2 LOAD_CONST 1 ('<dictcomp>') 4 MAKE_FUNCTION 0 6 LOAD_NAME 0 (a) 8 LOAD_METHOD 1 (items) 10 CALL_METHOD 0 12 GET_ITER 14 CALL_FUNCTION 1 16 STORE_NAME 2 (b) 18 LOAD_CONST 2 (None) 20 RETURN_VALUE Disassembly of <code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>: 1 0 BUILD_MAP 0 2 LOAD_FAST 0 (.0) >> 4 FOR_ITER 14 (to 20) 6 UNPACK_SEQUENCE 2 8 STORE_FAST 1 (k) 10 STORE_FAST 2 (v) 12 LOAD_FAST 1 (k) 14 LOAD_FAST 2 (v) 16 MAP_ADD 2 18 JUMP_ABSOLUTE 4 >> 20 RETURN_VALUE </code></pre> The material differences: the looped code uses <code>LOAD_NAME</code> for <code>b</code> each iteration, and <code>STORE_SUBSCR</code> to store the key-value pair in dict loaded. The dictionary comprehension uses <code>MAP_ADD</code> to achieve the same thing as <code>STORE_SUBSCR</code> but doesn't have to load that <code>b</code> name each time. But with 3 iterations only, the <code>MAKE_FUNCTION</code> / <code>CALL_FUNCTION</code> combo the dict comprehension has to execute is the real drag on the performance: <pre class="prettyprint"><code>>>> make_and_call = '(lambda i: None)(None)' >>> dis.dis(make_and_call) 1 0 LOAD_CONST 0 (<code object <lambda> at 0x11d6ab270, file "<dis>", line 1>) 2 LOAD_CONST 1 ('<lambda>') 4 MAKE_FUNCTION 0 6 LOAD_CONST 2 (None) 8 CALL_FUNCTION 1 10 RETURN_VALUE Disassembly of <code object <lambda> at 0x11d6ab270, file "<dis>", line 1>: 1 0 LOAD_CONST 0 (None) 2 RETURN_VALUE >>> count, total = timeit.Timer(make_and_call).autorange() >>> total / count * 1000000 0.12945385499915574 </code></pre> More than 0.1 μs to create a function object with one argument, and call it (with an extra <code>LOAD_CONST</code> for the <code>None</code> value we pass in)! And that's just about the difference between the looped and comprehension timings for 3 key-value pairs. You can liken this to being surprised that a man with a shovel can dig a small hole faster than a backhoe can. The backhoe can certainly dig fast, but a man with a shovel can get started faster if you need to get the backhoe started and moved into position first! Beyond a few key-value pairs (digging a bigger hole), the function create and call cost fades away into nothingness. At this point the dict comprehension and the explicit loop basically do the same thing: <ul> <li>take the next key-value pair, pop those on the stack</li> <li>call the <code>dict.__setitem__</code> hook via a bytecode operation with the top two items on the stack (either <code>STORE_SUBSCR</code> or <code>MAP_ADD</code>. This doesn't count as a 'function call' as it's all internally handled in the interpreter loop.</li> </ul> This is different from a list comprehension, where the plain loop version would have to use <code>list.append()</code>, involving an attribute lookup, and a function call each loop iteration. The list comprehension speed advantage comes from this difference; see Python list comprehension expensive What a dict comprehension does add, is that the target dictionary name only needs to be looked up once, when binding <code>b</code> to the the final dictionary object. If the target dictionary is a global instead of a local variable, the comprehension wins, hands down: <pre class="prettyprint"><code>>>> a = {rs(): rs() for _ in range(1000)} >>> len(a) 1000 >>> namespace = {} >>> count, total = timeit.Timer(looped, 'from __main__ import a; global b', globals=namespace).autorange() >>> (total / count) * 1000000 76.72348440100905 >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a; global b', globals=namespace).autorange() >>> (total / count) * 1000000 64.72114819916897 >>> len(namespace['b']) 1000 </code></pre> So just use a dict comprehension. The difference with < 30 elements to process is too small to care about, and the moment you are generating a global or have more items, the dict comprehension wins out anyway.

Why is this loop faster than a dictionary comprehension for creating a dictionary?

Tags:

performance

python

python-3.x

python-internals

dictionary-comprehension

I don't come from a software/computer science background but I love to code in Python and can generally understand why things are faster. I am really curious to know why this for loop runs faster than the dictionary comprehension. Any insights?

Problem : Given a dictionary a with these keys and values, return a dictionary with the values as keys and the keys as values. (challenge: do this in one line)

and the code

a = {'a':'hi','b':'hey','c':'yo'}  b = {} for i,j in a.items():     b[j]=i  %% timeit 932 ns ± 37.2 ns per loop  b = {v: k for k, v in a.items()}  %% timeit 1.08 µs ± 16.4 ns per loop

311

asked Sep 27 '18 18:09

Nadim Younes

1 Answers

You are testing with way too small an input; while a dictionary comprehension doesn't have as much of a performance advantage against a for loop when compared to a list comprehension, for realistic problem sizes it can and does beat for loops, especially when targeting a global name.

Your input consists of just 3 key-value pairs. Testing with 1000 elements instead, we see that the timings are very close instead:

>>> import timeit >>> from random import choice, randint; from string import ascii_lowercase as letters >>> looped = '''\ ... b = {} ... for i,j in a.items(): ...     b[j]=i ... ''' >>> dictcomp = '''b = {v: k for k, v in a.items()}''' >>> def rs(): return ''.join([choice(letters) for _ in range(randint(3, 15))]) ... >>> a = {rs(): rs() for _ in range(1000)} >>> len(a) 1000 >>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange() >>> (total / count) * 1000000   # microseconds per run 66.62004760000855 >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange() >>> (total / count) * 1000000   # microseconds per run 64.5464928005822

The difference is there, the dict comp is faster but only just at this scale. With 100 times as many key-value pairs the difference is a bit bigger:

>>> a = {rs(): rs() for _ in range(100000)} >>> len(a) 98476 >>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange() >>> total / count * 1000  # milliseconds, different scale! 15.48140200029593 >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange() >>> total / count * 1000  # milliseconds, different scale! 13.674790799996117

which is not that big a difference when you consider both processed nearly 100k key-value pairs. Still, the for loop is clearly slower.

So why the speed difference with 3 elements? Because a comprehension (dictionary, set, list comprehensions or a generator expression) is under the hood implemented as a new function, and calling that function has a base cost the plain loop doesn't have to pay.

Here's the disassembly for the bytecode for both alternatives; note the MAKE_FUNCTION and CALL_FUNCTION opcodes in the top-level bytecode for the dict comprehension, there is a separate section for what that function then does, and there are actually very few differences in between the two approaches here:

>>> import dis >>> dis.dis(looped)   1           0 BUILD_MAP                0               2 STORE_NAME               0 (b)    2           4 SETUP_LOOP              28 (to 34)               6 LOAD_NAME                1 (a)               8 LOAD_METHOD              2 (items)              10 CALL_METHOD              0              12 GET_ITER         >>   14 FOR_ITER                16 (to 32)              16 UNPACK_SEQUENCE          2              18 STORE_NAME               3 (i)              20 STORE_NAME               4 (j)    3          22 LOAD_NAME                3 (i)              24 LOAD_NAME                0 (b)              26 LOAD_NAME                4 (j)              28 STORE_SUBSCR              30 JUMP_ABSOLUTE           14         >>   32 POP_BLOCK         >>   34 LOAD_CONST               0 (None)              36 RETURN_VALUE >>> dis.dis(dictcomp)   1           0 LOAD_CONST               0 (<code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>)               2 LOAD_CONST               1 ('<dictcomp>')               4 MAKE_FUNCTION            0               6 LOAD_NAME                0 (a)               8 LOAD_METHOD              1 (items)              10 CALL_METHOD              0              12 GET_ITER              14 CALL_FUNCTION            1              16 STORE_NAME               2 (b)              18 LOAD_CONST               2 (None)              20 RETURN_VALUE  Disassembly of <code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>:   1           0 BUILD_MAP                0               2 LOAD_FAST                0 (.0)         >>    4 FOR_ITER                14 (to 20)               6 UNPACK_SEQUENCE          2               8 STORE_FAST               1 (k)              10 STORE_FAST               2 (v)              12 LOAD_FAST                1 (k)              14 LOAD_FAST                2 (v)              16 MAP_ADD                  2              18 JUMP_ABSOLUTE            4         >>   20 RETURN_VALUE

The material differences: the looped code uses LOAD_NAME for b each iteration, and STORE_SUBSCR to store the key-value pair in dict loaded. The dictionary comprehension uses MAP_ADD to achieve the same thing as STORE_SUBSCR but doesn't have to load that b name each time.

But with 3 iterations only, the MAKE_FUNCTION / CALL_FUNCTION combo the dict comprehension has to execute is the real drag on the performance:

>>> make_and_call = '(lambda i: None)(None)' >>> dis.dis(make_and_call)   1           0 LOAD_CONST               0 (<code object <lambda> at 0x11d6ab270, file "<dis>", line 1>)               2 LOAD_CONST               1 ('<lambda>')               4 MAKE_FUNCTION            0               6 LOAD_CONST               2 (None)               8 CALL_FUNCTION            1              10 RETURN_VALUE  Disassembly of <code object <lambda> at 0x11d6ab270, file "<dis>", line 1>:   1           0 LOAD_CONST               0 (None)               2 RETURN_VALUE >>> count, total = timeit.Timer(make_and_call).autorange() >>> total / count * 1000000 0.12945385499915574

More than 0.1 μs to create a function object with one argument, and call it (with an extra LOAD_CONST for the None value we pass in)! And that's just about the difference between the looped and comprehension timings for 3 key-value pairs.

You can liken this to being surprised that a man with a shovel can dig a small hole faster than a backhoe can. The backhoe can certainly dig fast, but a man with a shovel can get started faster if you need to get the backhoe started and moved into position first!

Beyond a few key-value pairs (digging a bigger hole), the function create and call cost fades away into nothingness. At this point the dict comprehension and the explicit loop basically do the same thing:

take the next key-value pair, pop those on the stack
call the dict.__setitem__ hook via a bytecode operation with the top two items on the stack (either STORE_SUBSCR or MAP_ADD. This doesn't count as a 'function call' as it's all internally handled in the interpreter loop.

This is different from a list comprehension, where the plain loop version would have to use list.append(), involving an attribute lookup, and a function call each loop iteration. The list comprehension speed advantage comes from this difference; see Python list comprehension expensive

What a dict comprehension does add, is that the target dictionary name only needs to be looked up once, when binding b to the the final dictionary object. If the target dictionary is a global instead of a local variable, the comprehension wins, hands down:

>>> a = {rs(): rs() for _ in range(1000)} >>> len(a) 1000 >>> namespace = {} >>> count, total = timeit.Timer(looped, 'from __main__ import a; global b', globals=namespace).autorange() >>> (total / count) * 1000000 76.72348440100905 >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a; global b', globals=namespace).autorange() >>> (total / count) * 1000000 64.72114819916897 >>> len(namespace['b']) 1000

So just use a dict comprehension. The difference with < 30 elements to process is too small to care about, and the moment you are generating a global or have more items, the dict comprehension wins out anyway.

177

answered Sep 22 '22 21:09

Martijn Pieters

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is this loop faster than a dictionary comprehension for creating a dictionary?

Tags:

performance

python

python-3.x

python-internals

dictionary-comprehension

Nadim Younes

People also ask

1 Answers

Martijn Pieters

Recent Activity

Donate For Us

Why is this loop faster than a dictionary comprehension for creating a dictionary?

Tags:

performance

python

python-3.x

python-internals

dictionary-comprehension

Nadim Younes

People also ask

1 Answers

Martijn Pieters

Related questions

Recent Activity

Donate For Us