Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What can you use Python generator functions for?

People also ask

What are Python generators used for?

Python Generator functions allow you to declare a function that behaves likes an iterator, allowing programmers to make an iterator in a fast, easy, and clean way. An iterator is an object that can be iterated or looped upon. It is used to abstract a container of data to make it behave like an iterable object.

When should I use a generator Python?

Generators are great when you encounter problems that require you to read from a large dataset. Reading from a large dataset indirectly means our computer or server would have to allocate memory for it. The only condition to remember is that a Generator can only be iterated once.

What is the benefit of Python generator?

Here is a summary of the advantages of generation expressions within python: Memory efficient method of generating sequence types in python. Adds further brevity and readability to written code. Generator expressions are generator functions shortened.

Why would you use a generator over a loop in Python?

Generators can also be looped over like a regular list. The benefit of using a loop is that it will run through all the yield from start to end. With this, we do not have to worry about the function reaching a StopIteration exception when there is no more data to go through.


Generators give you lazy evaluation. You use them by iterating over them, either explicitly with 'for' or implicitly by passing it to any function or construct that iterates. You can think of generators as returning multiple items, as if they return a list, but instead of returning them all at once they return them one-by-one, and the generator function is paused until the next item is requested.

Generators are good for calculating large sets of results (in particular calculations involving loops themselves) where you don't know if you are going to need all results, or where you don't want to allocate the memory for all results at the same time. Or for situations where the generator uses another generator, or consumes some other resource, and it's more convenient if that happened as late as possible.

Another use for generators (that is really the same) is to replace callbacks with iteration. In some situations you want a function to do a lot of work and occasionally report back to the caller. Traditionally you'd use a callback function for this. You pass this callback to the work-function and it would periodically call this callback. The generator approach is that the work-function (now a generator) knows nothing about the callback, and merely yields whenever it wants to report something. The caller, instead of writing a separate callback and passing that to the work-function, does all the reporting work in a little 'for' loop around the generator.

For example, say you wrote a 'filesystem search' program. You could perform the search in its entirety, collect the results and then display them one at a time. All of the results would have to be collected before you showed the first, and all of the results would be in memory at the same time. Or you could display the results while you find them, which would be more memory efficient and much friendlier towards the user. The latter could be done by passing the result-printing function to the filesystem-search function, or it could be done by just making the search function a generator and iterating over the result.

If you want to see an example of the latter two approaches, see os.path.walk() (the old filesystem-walking function with callback) and os.walk() (the new filesystem-walking generator.) Of course, if you really wanted to collect all results in a list, the generator approach is trivial to convert to the big-list approach:

big_list = list(the_generator)

One of the reasons to use generator is to make the solution clearer for some kind of solutions.

The other is to treat results one at a time, avoiding building huge lists of results that you would process separated anyway.

If you have a fibonacci-up-to-n function like this:

# function version
def fibon(n):
    a = b = 1
    result = []
    for i in xrange(n):
        result.append(a)
        a, b = b, a + b
    return result

You can more easily write the function as this:

# generator version
def fibon(n):
    a = b = 1
    for i in xrange(n):
        yield a
        a, b = b, a + b

The function is clearer. And if you use the function like this:

for x in fibon(1000000):
    print x,

in this example, if using the generator version, the whole 1000000 item list won't be created at all, just one value at a time. That would not be the case when using the list version, where a list would be created first.


I find this explanation which clears my doubt. Because there is a possibility that person who don't know Generators also don't know about yield

Return

The return statement is where all the local variables are destroyed and the resulting value is given back (returned) to the caller. Should the same function be called some time later, the function will get a fresh new set of variables.

Yield

But what if the local variables aren't thrown away when we exit a function? This implies that we can resume the function where we left off. This is where the concept of generators are introduced and the yield statement resumes where the function left off.

  def generate_integers(N):
    for i in xrange(N):
    yield i

    In [1]: gen = generate_integers(3)
    In [2]: gen
    <generator object at 0x8117f90>
    In [3]: gen.next()
    0
    In [4]: gen.next()
    1
    In [5]: gen.next()

So that's the difference between return and yield statements in Python.

Yield statement is what makes a function a generator function.

So generators are a simple and powerful tool for creating iterators. They are written like regular functions, but they use the yield statement whenever they want to return data. Each time next() is called, the generator resumes where it left off (it remembers all the data values and which statement was last executed).


See the "Motivation" section in PEP 255.

A non-obvious use of generators is creating interruptible functions, which lets you do things like update UI or run several jobs "simultaneously" (interleaved, actually) while not using threads.


Real World Example

Let's say you have 100 million domains in your MySQL table, and you would like to update Alexa rank for each domain.

First thing you need is to select your domain names from the database.

Let's say your table name is domains and column name is domain.

If you use SELECT domain FROM domains it's going to return 100 million rows which is going to consume lot of memory. So your server might crash.

So you decided to run the program in batches. Let's say our batch size is 1000.

In our first batch we will query the first 1000 rows, check Alexa rank for each domain and update the database row.

In our second batch we will work on the next 1000 rows. In our third batch it will be from 2001 to 3000 and so on.

Now we need a generator function which generates our batches.

Here is our generator function:

def ResultGenerator(cursor, batchsize=1000):
    while True:
        results = cursor.fetchmany(batchsize)
        if not results:
            break
        for result in results:
            yield result

As you can see, our function keeps yielding the results. If you used the keyword return instead of yield, then the whole function would be ended once it reached return.

return - returns only once
yield - returns multiple times

If a function uses the keyword yield then it's a generator.

Now you can iterate like this:

db = MySQLdb.connect(host="localhost", user="root", passwd="root", db="domains")
cursor = db.cursor()
cursor.execute("SELECT domain FROM domains")
for result in ResultGenerator(cursor):
    doSomethingWith(result)
db.close()

Buffering. When it is efficient to fetch data in large chunks, but process it in small chunks, then a generator might help:

def bufferedFetch():
  while True:
     buffer = getBigChunkOfData()
     # insert some code to break on 'end of data'
     for i in buffer:    
          yield i

The above lets you easily separate buffering from processing. The consumer function can now just get the values one by one without worrying about buffering.