This is rather the inverse of What can you use Python generator functions for?: python generators, generator expressions, and the itertools
module are some of my favorite features of python these days. They're especially useful when setting up chains of operations to perform on a big pile of data--I often use them when processing DSV files.
So when is it not a good time to use a generator, or a generator expression, or an itertools
function?
zip()
over itertools.izip()
, orrange()
over xrange()
, or[x for x in foo]
over (x for x in foo)
?Obviously, we eventually need to "resolve" a generator into actual data, usually by creating a list or iterating over it with a non-generator loop. Sometimes we just need to know the length. This isn't what I'm asking.
We use generators so that we're not assigning new lists into memory for interim data. This especially makes sense for large datasets. Does it make sense for small datasets too? Is there a noticeable memory/cpu trade-off?
I'm especially interested if anyone has done some profiling on this, in light of the eye-opening discussion of list comprehension performance vs. map() and filter(). (alt link)
Generators are great when you encounter problems that require you to read from a large dataset. Reading from a large dataset indirectly means our computer or server would have to allocate memory for it. The only condition to remember is that a Generator can only be iterated once.
Generators are good for calculating large sets of results (in particular calculations involving loops themselves) where you don't know if you are going to need all results, or where you don't want to allocate the memory for all results at the same time.
In Python, yield is a keyword that turns a function into a generator. Unlike a list, a generator does not store values. Instead, it knows the current value and how to get the next one. This makes a generator memory-efficient.
Generators can also be looped over like a regular list. The benefit of using a loop is that it will run through all the yield from start to end. With this, we do not have to worry about the function reaching a StopIteration exception when there is no more data to go through.
Use a list instead of a generator when:
1) You need to access the data multiple times (i.e. cache the results instead of recomputing them):
for i in outer: # used once, okay to be a generator or return a list for j in inner: # used multiple times, reusing a list is better ...
2) You need random access (or any access other than forward sequential order):
for i in reversed(data): ... # generators aren't reversible s[i], s[j] = s[j], s[i] # generators aren't indexable
3) You need to join strings (which requires two passes over the data):
s = ''.join(data) # lists are faster than generators in this use case
4) You are using PyPy which sometimes can't optimize generator code as much as it can with normal function calls and list manipulations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With