Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`yield from` generator vs `yield from` list performance [duplicate]

Python 3.6.8 (default, Oct  7 2019, 12:59:55) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: def yield_from_generator(): 
   ...:     yield from (i for i in range(10000)) 
   ...:                                                                                                                                    

In [2]: def yield_from_list(): 
   ...:     yield from [i for i in range(10000)] 
   ...:                                                                                                                                    

In [3]: import timeit                                                                                                                      

In [4]: timeit.timeit(lambda: list(yield_from_generator()), number=10000)                                                                  
Out[4]: 5.3820097140014695

In [5]: timeit.timeit(lambda: list(yield_from_list()), number=10000)                                                                       
Out[5]: 4.333915593000711

I run yield from generator and yield from list many times. List version gives always better performance, while my intuition tells me rather opposite conclusions - making list requires i.e. memory allocation at startup. Why we can notice such performance differences?

like image 383
pt12lol Avatar asked Feb 10 '20 12:02

pt12lol


People also ask

Does yield return a generator?

The yield statement returns a generator object to the one who calls the function which contains yield, instead of simply returning a value.

What is generator yield?

The yield keyword pauses generator function execution and the value of the expression following the yield keyword is returned to the generator's caller. It can be thought of as a generator-based version of the return keyword. yield can only be called directly from the generator function that contains it.

Does generator contain yield statement?

Generator function contains one or more yield statements. When called, it returns an object (iterator) but does not start execution immediately. Methods like __iter__() and __next__() are implemented automatically. So we can iterate through the items using next() .

How many times Yield statement can be used in generator?

Unless your generator is infinite, you can iterate through it one time only. Once all values have been evaluated, iteration will stop and the for loop will exit. If you used next() , then instead you'll get an explicit StopIteration exception.


1 Answers

the short answer is that the surface syntax makes them look more similar than they are

I'll break down a series of functions in more detail (the dis module is helpful for this), I'll separate things out into a setup cost and a cost per yielded value. we start with:

def yield_from_generator():
    yield from (i for i in range(10000))

the costs are:

  • setup: create the range object and invoke the embedded generator expression
  • per-yield: yield from the genexpr, which also invokes a next on the range iterator. note that there are two context switches here

next we look at:

def yield_from_list():
    yield from [i for i in range(10000)]

costs are:

  • setup: create a new list and populate it using a list comprehension. this uses special list op-codes so will be fast
  • per-yield: just resumes the list's iterator so is fast

next we look at a similar function:

def yield_from_list2():
    yield from list(i for i in range(10000))

this doesn't use the special list op-codes and has the double nesting of generators so is slow again. costs are:

  • setup: create a new generator expression and pass it to the list constructor, this will iterate over the generator expression that iterates over the range object
  • per-yield: uses the list's iterator so is fast again

and finally a fast version just stressing yield from:

def yield_from_generator2():
    yield from range(10000)

costs are:

  • setup: create a range object
  • per-yield: resume range iterator directly

timings of all of these on my laptop are:

yield_from_generator  639 µs
yield_from_list       536 µs
yield_from_list2      689 µs
yield_from_generator2 354 µs

hopefully it's a bit clearer now. another version is:

def yield_from_list3():
    yield from list(range(10000))

that runs in 401 µs but hopefully it's more obvious why this sits in the middle, performance wise

like image 172
Sam Mason Avatar answered Oct 04 '22 19:10

Sam Mason