Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is summing list comprehension faster than generator expression?

Not sure if title is correct terminology.

If you have to compare the characters in 2 strings (A,B) and count the number of matches of chars in B against A:

sum([ch in A for ch in B])

is faster on %timeit than

sum(ch in A for ch in B)

I understand that the first one will create a list of bool, and then sum the values of 1. The second one is a generator. I'm not clear on what it is doing internally and why it is slower?

Thanks.

Edit with %timeit results:

10 characters

generator expression

list

10000 loops, best of 3: 112 µs per loop

10000 loops, best of 3: 94.6 µs per loop

1000 characters

generator expression

list

100 loops, best of 3: 8.5 ms per loop

100 loops, best of 3: 6.9 ms per loop

10,000 characters

generator expression

list

10 loops, best of 3: 87.5 ms per loop

10 loops, best of 3: 76.1 ms per loop

100,000 characters

generator expression

list

1 loop, best of 3: 908 ms per loop

1 loop, best of 3: 840 ms per loop

like image 642
Chris94549 Avatar asked Jul 19 '20 01:07

Chris94549


People also ask

Is generator expression faster than list comprehension?

Thus, generator expressions are faster than list comprehension and hence time efficient.

What makes a list comprehension different from a generator?

List comprehensions and generators are not different at all; they are just different ways of writing the same thing. A list comprehension produces a list as output, a generator produces a generator object.

Why are list comprehensions faster than for loops?

As we can see, the for loop is slower than the list comprehension (9.9 seconds vs. 8.2 seconds). List comprehensions are faster than for loops to create lists. But, this is because we are creating a list by appending new elements to it at each iteration.

What is the difference between a list comprehension dict comprehension and generator expression?

The only difference between Generator Comprehension and List Comprehension is that the former uses parentheses.


2 Answers

I took a look at the disassembly of each construct (using dis). I did this by declaring these two functions:

def list_comprehension():
    return sum([ch in A for ch in B])

def generation_expression():
    return sum(ch in A for ch in B)

and then calling dis.dis with each function.

For the list comprehension:

 0 BUILD_LIST               0
 2 LOAD_FAST                0 (.0)
 4 FOR_ITER                12 (to 18)
 6 STORE_FAST               1 (ch)
 8 LOAD_FAST                1 (ch)
10 LOAD_GLOBAL              0 (A)
12 COMPARE_OP               6 (in)
14 LIST_APPEND              2
16 JUMP_ABSOLUTE            4
18 RETURN_VALUE

and for the generator expression:

 0 LOAD_FAST                0 (.0)
 2 FOR_ITER                14 (to 18)
 4 STORE_FAST               1 (ch)
 6 LOAD_FAST                1 (ch)
 8 LOAD_GLOBAL              0 (A)
10 COMPARE_OP               6 (in)
12 YIELD_VALUE
14 POP_TOP
16 JUMP_ABSOLUTE            2
18 LOAD_CONST               0 (None)
20 RETURN_VALUE

The disassembly for the actual summation is:

 0 LOAD_GLOBAL              0 (sum)
 2 LOAD_CONST               1 (<code object <genexpr> at 0x7f49dc395240, file "/home/mishac/dev/python/kintsugi/KintsugiModels/automated_tests/a.py", line 12>)
 4 LOAD_CONST               2 ('generation_expression.<locals>.<genexpr>')
 6 MAKE_FUNCTION            0
 8 LOAD_GLOBAL              1 (B)
10 GET_ITER
12 CALL_FUNCTION            1
14 CALL_FUNCTION            1
16 RETURN_VALUE

but this sum disassembly was constant between both your examples, with the only difference being the loading of generation_expression.<locals>.<genexpr> vs list_comprehension.<locals>.<listcomp> (so just loading a different local variable).

The differing bytecode instructions between the first two disassemblies are LIST_APPEND for the list comprehension vs. the conjunction of YIELD_VALUE and POP_TOP for the generator expression.

I won't pretend I know the intrinsics of Python bytecode, but what I gather from this is that the generator expression is implemented as a queue, where the value is generated and then popped. This popping doesn't have to happen in a list comprehension, leading me to believe there'll be a slight amount of overhead in using generators.

Now this doesn't mean that generators are always going to be slower. Generators excel at being memory-efficient, so there will be a threshold N such that list comprehensions will perform slightly better before this threshold (because memory use won't be a problem), but after this threshold, generators will significantly perform better.

like image 64
Mario Ishac Avatar answered Oct 06 '22 23:10

Mario Ishac


Generators are usually slower than list comprehension, the whole point of generators is to be memory efficient because they yield each item by creating them in a lazy fashion (only when actually needed). They favor memory efficency over speed.

like image 31
Andreas Avatar answered Oct 06 '22 23:10

Andreas