Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this loop faster than a dictionary comprehension for creating a dictionary?

I don't come from a software/computer science background but I love to code in Python and can generally understand why things are faster. I am really curious to know why this for loop runs faster than the dictionary comprehension. Any insights?

Problem : Given a dictionary a with these keys and values, return a dictionary with the values as keys and the keys as values. (challenge: do this in one line)

and the code

a = {'a':'hi','b':'hey','c':'yo'}  b = {} for i,j in a.items():     b[j]=i  %% timeit 932 ns ± 37.2 ns per loop  b = {v: k for k, v in a.items()}  %% timeit 1.08 µs ± 16.4 ns per loop 
like image 311
Nadim Younes Avatar asked Sep 27 '18 18:09

Nadim Younes


People also ask

Why are dictionaries faster Python?

Lookups are faster in dictionaries because Python implements them using hash tables. If we explain the difference by Big O concepts, dictionaries have constant time complexity, O(1) while lists have linear time complexity, O(n).

What are the advantages of dictionary comprehension?

Advantages of Using Dictionary Comprehension As we can see, dictionary comprehension shortens the process of dictionary initialization by a lot. It makes the code more pythonic. Using dictionary comprehension in our code can shorten the lines of code while keeping the logic intact.

What does it mean to loop a dictionary?

You can loop through a dictionary by using a for loop. When looping through a dictionary, the return value are the keys of the dictionary, but there are methods to return the values as well.

Why do we use dictionary comprehension in Python?

Dictionaries in Python allow us to store a series of mappings between two sets of values, namely, the keys and the values. All items in the dictionary are enclosed within a pair of curly braces {} . Each item in a dictionary is a mapping between a key and a value - called a key-value pair.


1 Answers

You are testing with way too small an input; while a dictionary comprehension doesn't have as much of a performance advantage against a for loop when compared to a list comprehension, for realistic problem sizes it can and does beat for loops, especially when targeting a global name.

Your input consists of just 3 key-value pairs. Testing with 1000 elements instead, we see that the timings are very close instead:

>>> import timeit >>> from random import choice, randint; from string import ascii_lowercase as letters >>> looped = '''\ ... b = {} ... for i,j in a.items(): ...     b[j]=i ... ''' >>> dictcomp = '''b = {v: k for k, v in a.items()}''' >>> def rs(): return ''.join([choice(letters) for _ in range(randint(3, 15))]) ... >>> a = {rs(): rs() for _ in range(1000)} >>> len(a) 1000 >>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange() >>> (total / count) * 1000000   # microseconds per run 66.62004760000855 >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange() >>> (total / count) * 1000000   # microseconds per run 64.5464928005822 

The difference is there, the dict comp is faster but only just at this scale. With 100 times as many key-value pairs the difference is a bit bigger:

>>> a = {rs(): rs() for _ in range(100000)} >>> len(a) 98476 >>> count, total = timeit.Timer(looped, 'from __main__ import a').autorange() >>> total / count * 1000  # milliseconds, different scale! 15.48140200029593 >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a').autorange() >>> total / count * 1000  # milliseconds, different scale! 13.674790799996117 

which is not that big a difference when you consider both processed nearly 100k key-value pairs. Still, the for loop is clearly slower.

So why the speed difference with 3 elements? Because a comprehension (dictionary, set, list comprehensions or a generator expression) is under the hood implemented as a new function, and calling that function has a base cost the plain loop doesn't have to pay.

Here's the disassembly for the bytecode for both alternatives; note the MAKE_FUNCTION and CALL_FUNCTION opcodes in the top-level bytecode for the dict comprehension, there is a separate section for what that function then does, and there are actually very few differences in between the two approaches here:

>>> import dis >>> dis.dis(looped)   1           0 BUILD_MAP                0               2 STORE_NAME               0 (b)    2           4 SETUP_LOOP              28 (to 34)               6 LOAD_NAME                1 (a)               8 LOAD_METHOD              2 (items)              10 CALL_METHOD              0              12 GET_ITER         >>   14 FOR_ITER                16 (to 32)              16 UNPACK_SEQUENCE          2              18 STORE_NAME               3 (i)              20 STORE_NAME               4 (j)    3          22 LOAD_NAME                3 (i)              24 LOAD_NAME                0 (b)              26 LOAD_NAME                4 (j)              28 STORE_SUBSCR              30 JUMP_ABSOLUTE           14         >>   32 POP_BLOCK         >>   34 LOAD_CONST               0 (None)              36 RETURN_VALUE >>> dis.dis(dictcomp)   1           0 LOAD_CONST               0 (<code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>)               2 LOAD_CONST               1 ('<dictcomp>')               4 MAKE_FUNCTION            0               6 LOAD_NAME                0 (a)               8 LOAD_METHOD              1 (items)              10 CALL_METHOD              0              12 GET_ITER              14 CALL_FUNCTION            1              16 STORE_NAME               2 (b)              18 LOAD_CONST               2 (None)              20 RETURN_VALUE  Disassembly of <code object <dictcomp> at 0x11d6ade40, file "<dis>", line 1>:   1           0 BUILD_MAP                0               2 LOAD_FAST                0 (.0)         >>    4 FOR_ITER                14 (to 20)               6 UNPACK_SEQUENCE          2               8 STORE_FAST               1 (k)              10 STORE_FAST               2 (v)              12 LOAD_FAST                1 (k)              14 LOAD_FAST                2 (v)              16 MAP_ADD                  2              18 JUMP_ABSOLUTE            4         >>   20 RETURN_VALUE 

The material differences: the looped code uses LOAD_NAME for b each iteration, and STORE_SUBSCR to store the key-value pair in dict loaded. The dictionary comprehension uses MAP_ADD to achieve the same thing as STORE_SUBSCR but doesn't have to load that b name each time.

But with 3 iterations only, the MAKE_FUNCTION / CALL_FUNCTION combo the dict comprehension has to execute is the real drag on the performance:

>>> make_and_call = '(lambda i: None)(None)' >>> dis.dis(make_and_call)   1           0 LOAD_CONST               0 (<code object <lambda> at 0x11d6ab270, file "<dis>", line 1>)               2 LOAD_CONST               1 ('<lambda>')               4 MAKE_FUNCTION            0               6 LOAD_CONST               2 (None)               8 CALL_FUNCTION            1              10 RETURN_VALUE  Disassembly of <code object <lambda> at 0x11d6ab270, file "<dis>", line 1>:   1           0 LOAD_CONST               0 (None)               2 RETURN_VALUE >>> count, total = timeit.Timer(make_and_call).autorange() >>> total / count * 1000000 0.12945385499915574 

More than 0.1 μs to create a function object with one argument, and call it (with an extra LOAD_CONST for the None value we pass in)! And that's just about the difference between the looped and comprehension timings for 3 key-value pairs.

You can liken this to being surprised that a man with a shovel can dig a small hole faster than a backhoe can. The backhoe can certainly dig fast, but a man with a shovel can get started faster if you need to get the backhoe started and moved into position first!

Beyond a few key-value pairs (digging a bigger hole), the function create and call cost fades away into nothingness. At this point the dict comprehension and the explicit loop basically do the same thing:

  • take the next key-value pair, pop those on the stack
  • call the dict.__setitem__ hook via a bytecode operation with the top two items on the stack (either STORE_SUBSCR or MAP_ADD. This doesn't count as a 'function call' as it's all internally handled in the interpreter loop.

This is different from a list comprehension, where the plain loop version would have to use list.append(), involving an attribute lookup, and a function call each loop iteration. The list comprehension speed advantage comes from this difference; see Python list comprehension expensive

What a dict comprehension does add, is that the target dictionary name only needs to be looked up once, when binding b to the the final dictionary object. If the target dictionary is a global instead of a local variable, the comprehension wins, hands down:

>>> a = {rs(): rs() for _ in range(1000)} >>> len(a) 1000 >>> namespace = {} >>> count, total = timeit.Timer(looped, 'from __main__ import a; global b', globals=namespace).autorange() >>> (total / count) * 1000000 76.72348440100905 >>> count, total = timeit.Timer(dictcomp, 'from __main__ import a; global b', globals=namespace).autorange() >>> (total / count) * 1000000 64.72114819916897 >>> len(namespace['b']) 1000 

So just use a dict comprehension. The difference with < 30 elements to process is too small to care about, and the moment you are generating a global or have more items, the dict comprehension wins out anyway.

like image 177
Martijn Pieters Avatar answered Sep 22 '22 21:09

Martijn Pieters