Python 3.6.8 (default, Oct 7 2019, 12:59:55)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: def yield_from_generator():
...: yield from (i for i in range(10000))
...:
In [2]: def yield_from_list():
...: yield from [i for i in range(10000)]
...:
In [3]: import timeit
In [4]: timeit.timeit(lambda: list(yield_from_generator()), number=10000)
Out[4]: 5.3820097140014695
In [5]: timeit.timeit(lambda: list(yield_from_list()), number=10000)
Out[5]: 4.333915593000711
I run yield from
generator and yield from
list many times. List version gives always better performance, while my intuition tells me rather opposite conclusions - making list requires i.e. memory allocation at startup. Why we can notice such performance differences?
The yield statement returns a generator object to the one who calls the function which contains yield, instead of simply returning a value.
The yield keyword pauses generator function execution and the value of the expression following the yield keyword is returned to the generator's caller. It can be thought of as a generator-based version of the return keyword. yield can only be called directly from the generator function that contains it.
Generator function contains one or more yield statements. When called, it returns an object (iterator) but does not start execution immediately. Methods like __iter__() and __next__() are implemented automatically. So we can iterate through the items using next() .
Unless your generator is infinite, you can iterate through it one time only. Once all values have been evaluated, iteration will stop and the for loop will exit. If you used next() , then instead you'll get an explicit StopIteration exception.
the short answer is that the surface syntax makes them look more similar than they are
I'll break down a series of functions in more detail (the dis
module is helpful for this), I'll separate things out into a setup cost and a cost per yielded value. we start with:
def yield_from_generator():
yield from (i for i in range(10000))
the costs are:
genexpr
, which also invokes a next
on the range
iterator. note that there are two context switches herenext we look at:
def yield_from_list():
yield from [i for i in range(10000)]
costs are:
list
op-codes so will be fastlist
's iterator so is fastnext we look at a similar function:
def yield_from_list2():
yield from list(i for i in range(10000))
this doesn't use the special list op-codes and has the double nesting of generators so is slow again. costs are:
list
's iterator so is fast againand finally a fast version just stressing yield from
:
def yield_from_generator2():
yield from range(10000)
costs are:
range
objectrange
iterator directlytimings of all of these on my laptop are:
yield_from_generator 639 µs
yield_from_list 536 µs
yield_from_list2 689 µs
yield_from_generator2 354 µs
hopefully it's a bit clearer now. another version is:
def yield_from_list3():
yield from list(range(10000))
that runs in 401 µs but hopefully it's more obvious why this sits in the middle, performance wise
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With