`yield from` generator vs `yield from` list performance [duplicate]

Tags:

Python 3.6.8 (default, Oct  7 2019, 12:59:55) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: def yield_from_generator(): 
   ...:     yield from (i for i in range(10000)) 
   ...:                                                                                                                                    

In [2]: def yield_from_list(): 
   ...:     yield from [i for i in range(10000)] 
   ...:                                                                                                                                    

In [3]: import timeit                                                                                                                      

In [4]: timeit.timeit(lambda: list(yield_from_generator()), number=10000)                                                                  
Out[4]: 5.3820097140014695

In [5]: timeit.timeit(lambda: list(yield_from_list()), number=10000)                                                                       
Out[5]: 4.333915593000711

I run yield from generator and yield from list many times. List version gives always better performance, while my intuition tells me rather opposite conclusions - making list requires i.e. memory allocation at startup. Why we can notice such performance differences?

383

asked Feb 10 '20 12:02

pt12lol

1 Answers

the short answer is that the surface syntax makes them look more similar than they are

I'll break down a series of functions in more detail (the dis module is helpful for this), I'll separate things out into a setup cost and a cost per yielded value. we start with:

def yield_from_generator():
    yield from (i for i in range(10000))

the costs are:

setup: create the range object and invoke the embedded generator expression
per-yield: yield from the genexpr, which also invokes a next on the range iterator. note that there are two context switches here

next we look at:

def yield_from_list():
    yield from [i for i in range(10000)]

costs are:

setup: create a new list and populate it using a list comprehension. this uses special list op-codes so will be fast
per-yield: just resumes the list's iterator so is fast

next we look at a similar function:

def yield_from_list2():
    yield from list(i for i in range(10000))

this doesn't use the special list op-codes and has the double nesting of generators so is slow again. costs are:

setup: create a new generator expression and pass it to the list constructor, this will iterate over the generator expression that iterates over the range object
per-yield: uses the list's iterator so is fast again

and finally a fast version just stressing yield from:

def yield_from_generator2():
    yield from range(10000)

costs are:

setup: create a range object
per-yield: resume range iterator directly

timings of all of these on my laptop are:

yield_from_generator  639 µs
yield_from_list       536 µs
yield_from_list2      689 µs
yield_from_generator2 354 µs

hopefully it's a bit clearer now. another version is:

def yield_from_list3():
    yield from list(range(10000))

that runs in 401 µs but hopefully it's more obvious why this sits in the middle, performance wise

172

answered Oct 04 '22 19:10

Sam Mason

Related questions
                            
                                How to pair (x,y) pairs using numpy
                            
                                Mean Square Displacement as a Function of Time in Python
                            
                                Qt: Session management error: None of the authentication protocols specified are supported. When using Python sockets on Linux
                            
                                How to exclude multiple values of column using Django ORM?
                            
                                python: obtaining the OS's argv[0], not sys.argv[0]
                            
                                Create an excel file from BytesIO using python
                            
                                CommandError: 'learning_log's not a valid project name. Please make sure the name is a valid identifier
                            
                                execute pytest using pipeline in Jenkins
                            
                                How to ship requirements.txt to users without development-packages such as PyLint etc.?
                            
                                sklearn_extra installation issue
                            
                                Failed to load the native TensorFlow runtime - TensorFlow 2.1
                            
                                Removing SEP token in Bert for text classification
                            
                                How to run background tasks in python
                            
                                How can I build an LSTM AutoEncoder with PyTorch?
                            
                                Can you reverse a PyTorch neural network and activate the inputs from the outputs?
                            
                                Mock DNS server using Twisted
                            
                                Does there exist any alternative of `logspace` in Julia (v1.3.1)?
                            
                                How does Python compare 'int' to 'float' objects?
                            
                                Matplotlib with brokenaxes package second Y-Axis
                            
                                How to plot geodesic curves on a surface embedded in 3D?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

`yield from` generator vs `yield from` list performance [duplicate]

Tags:

python

generator

yield

coroutine

list

pt12lol

People also ask

1 Answers

Sam Mason

Recent Activity

Donate For Us