Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python generator vs callback function

I have a class that solves an exact cover problem using a recursive, backtracking algorithm. Originally, I implemented the class with a callback function I passed to the object during initialization. This callback is invoked whenever a solution is found. In looking at someone else's implementation of the same problem, I saw that they were using yield statements to pass a solution out, in other words, their code was a python generator. I thought this was an interesting idea so I made a new version of my class to use yields. I then ran comparison tests between the two versions and, to my surprise, I found the generator version ran 5 times slower than the callback version. Note that, except for switching in a yield for a callback, the code is identical.

What is going on here? I'm speculating that, because a generator needs to save state information before yielding and then restore that state when restarting at the next call, it is this save/restore that is what makes the generator version run so much slower. If this is the case, how much state information is the generator having to save and restore?

Any ideas from the python experts?

--Edited 7:40 PDT

Here is the solver code which uses yield. Replace the first yield below with a call to the callback function and change the following loop with the second yield to just a recursive call to solve for the original version of this code.

   def solve(self):
      for tp in self.pieces:
         if self.inuse[tp.name]: continue

         self.inuse[tp.name] = True
         while tp.next_orientation() is not None:
            if tp.insert_piece():
               self.n_trials += 1
               self.pieces_in += 1
               self.free_cells -= tp.size

               if self.pieces_in == len(self.pieces) or self.free_cells == 0:
                  self.solutions += 1
                  self.haveSolution = True
                  yield True
                  self.haveSolution = False
               else:
                  self.table.next_base_square()
                  for tf in self.solve():
                     yield tf

               tp.remove_piece()
               self.pieces_in -= 1
               self.table.set_base_square(tp.base_square)
               self.free_cells += tp.size

         self.inuse[tp.name] = False
         tp.reset_orientation()

The mail loop which invokes the solver (after initialization, of course) is

   start_time = time.time()
   for tf in s.solve():
      printit(s)

   end_time = time.time()
   delta_time = end_time - start_time

In the callback version, the loop is gone with just a single call to solve.

like image 819
sizzzzlerz Avatar asked Apr 18 '11 14:04

sizzzzlerz


People also ask

Why generators are better in Python?

Generators allow you to create iterators in a very pythonic manner. Iterators allow lazy evaluation, only generating the next element of an iterable object when requested. This is useful for very large data sets. Iterators and generators can only be iterated over once.

Are Python generators efficient?

Generators are memory-efficient ways of processing huge datasets. They process the data incrementally and do not allocate memory to all the results at the same time. They really come in handy when implementing data science pipelines for huge datasets in a resource-constrained environment (in terms of RAM).

When would you use a generator in Python?

Iterators and generators are typically used to handle a large stream of data theoretically even an infinite stream of data. These large streams of data cannot be stored in memory at once, to handle this we can use generators to handle only one item at a time.

Are generators lazy in Python?

Key takeaways: motivation and uses behind generators Generators are memory efficient since they only require memory for the one value they yield. Generators are lazy: they only yield values when explicitly asked. You can feed the output of a generator to the input of another generator to form data pipelines.


1 Answers

What i meant in my comment, ("yielding from a recursive function sounds like it requires extra for loops to pass the results down to the caller") is this line:

          for tf in self.solve():
             yield tf

These lines recursively loop over the results from the deeper recursion stages. That means that a single result is iterated over on each level of the recursion, resulting in a lot of unnecessary looping.

Let me illustrate with this example:

n = 0
def rekurse(z):
    global n
    if z:
        yield z
        for x in rekurse(z-1):
            n += 1
            yield x

print list(rekurse(10))
print n

As you can see this simply counts down from 10, so you'd expect a a linear number of iterations. What you can see though is that n grows quadratically - recurse(10) loops over 9 items, recurse(9) over 8 items and so on.

The more items you have, the more time Python spends on these simple lines. Callbacks completely avoid that problem, so I'd suspect that is the problem with your code.

A optimized implementation of PEP 380 could fix this (see this paragraph). In the meantime I don't think it's a good idea to yield from recursive functions (at least if they recurse deeply), they just don't work well together.

like image 195
Jochen Ritzel Avatar answered Nov 05 '22 01:11

Jochen Ritzel