Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is using a Python generator much slower to traverse binary tree than not?

I've got a binary tree, where the nodes interact with data. I initially implemented a standard post order recursive traversal.

def visit_rec(self, node, data):
    if node:
        self.visit_rec(node.left, data)
        self.visit_rec(node.right, data)

        node.do_stuff(data)

I thought I could improve it by using generators so that I can use the same traversal method for other uses, and not have to pass the same data around constantly. This implementation is shown below.

def visit_rec_gen(self, node):
    if node:
        for n in self.visit_rec_gen(node.left):
                yield n
        for n in self.visit_rec_gen(node.right):
                yield n

        yield node

for node in self.visit_rec_gen():
    node.do_stuff(data)

However, this was far slower than the previous version (~50s to ~17s) and used far more memory. Have I made a mistake in my generator function version? I'd prefer to use this method but not at the expense of performance.

EDIT: Something I should have mentioned initially was that these results were obtained under PyPy 2.3.1, rather than standard CPython.

like image 927
Stuart Lacy Avatar asked Jul 25 '14 18:07

Stuart Lacy


People also ask

Are generators slower than lists?

Thus, generator expressions are faster than list comprehension and hence time efficient.

How generators work in Python?

A Python generator is a function that produces a sequence of results. It works by maintaining its local state, so that the function can resume again exactly where it left off when called subsequent times. Thus, you can think of a generator as something like a powerful iterator.

What is generator and yield in Python?

The Yield keyword in Python is similar to a return statement used for returning values or objects in Python. However, there is a slight difference. The yield statement returns a generator object to the one who calls the function which contains yield, instead of simply returning a value.


2 Answers

On PyPy, function calls are much more highly optimized than generators or iterators.

There are many things that have different performance characteristics in PyPy (for example, PyPy's itertools.islice() performs abyssmally).

You're doing the right thing by measuring the performance to see which way is fastest.

Also note PyPy has tools to show the code that is generated so you get a more detailed answer to the question "what does it do". Of course, the question of "why does it do that" has a human component in the answer that involves what was convenient to implement or the proclivities of the implementers.

like image 94
Raymond Hettinger Avatar answered Sep 19 '22 02:09

Raymond Hettinger


If you're using python3.3, the yield from statement is optimized to be faster than iterating for the purpose of yielding:

def visit_rec_gen(self, node):
    if node:
        yield from self.visit_rec_gen(node.left)
        yield from self.visit_rec_gen(node.right)
        yield node
like image 45
shx2 Avatar answered Sep 21 '22 02:09

shx2