Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is not a good time to use python generators?

This is rather the inverse of What can you use Python generator functions for?: python generators, generator expressions, and the itertools module are some of my favorite features of python these days. They're especially useful when setting up chains of operations to perform on a big pile of data--I often use them when processing DSV files.

So when is it not a good time to use a generator, or a generator expression, or an itertools function?

  • When should I prefer zip() over itertools.izip(), or
  • range() over xrange(), or
  • [x for x in foo] over (x for x in foo)?

Obviously, we eventually need to "resolve" a generator into actual data, usually by creating a list or iterating over it with a non-generator loop. Sometimes we just need to know the length. This isn't what I'm asking.

We use generators so that we're not assigning new lists into memory for interim data. This especially makes sense for large datasets. Does it make sense for small datasets too? Is there a noticeable memory/cpu trade-off?

I'm especially interested if anyone has done some profiling on this, in light of the eye-opening discussion of list comprehension performance vs. map() and filter(). (alt link)

like image 702
David Eyk Avatar asked Oct 29 '08 04:10

David Eyk


People also ask

When should I use Python generators?

Generators are great when you encounter problems that require you to read from a large dataset. Reading from a large dataset indirectly means our computer or server would have to allocate memory for it. The only condition to remember is that a Generator can only be iterated once.

When should you use a generator?

Generators are good for calculating large sets of results (in particular calculations involving loops themselves) where you don't know if you are going to need all results, or where you don't want to allocate the memory for all results at the same time.

Are Python generators efficient?

In Python, yield is a keyword that turns a function into a generator. Unlike a list, a generator does not store values. Instead, it knows the current value and how to get the next one. This makes a generator memory-efficient.

Why would you use a generator over a loop in Python?

Generators can also be looped over like a regular list. The benefit of using a loop is that it will run through all the yield from start to end. With this, we do not have to worry about the function reaching a StopIteration exception when there is no more data to go through.


1 Answers

Use a list instead of a generator when:

1) You need to access the data multiple times (i.e. cache the results instead of recomputing them):

for i in outer:           # used once, okay to be a generator or return a list     for j in inner:       # used multiple times, reusing a list is better          ... 

2) You need random access (or any access other than forward sequential order):

for i in reversed(data): ...     # generators aren't reversible  s[i], s[j] = s[j], s[i]          # generators aren't indexable 

3) You need to join strings (which requires two passes over the data):

s = ''.join(data)                # lists are faster than generators in this use case 

4) You are using PyPy which sometimes can't optimize generator code as much as it can with normal function calls and list manipulations.

like image 96
Raymond Hettinger Avatar answered Sep 18 '22 22:09

Raymond Hettinger