I've got a binary tree, where the nodes interact with data. I initially implemented a standard post order recursive traversal.
def visit_rec(self, node, data):
if node:
self.visit_rec(node.left, data)
self.visit_rec(node.right, data)
node.do_stuff(data)
I thought I could improve it by using generators so that I can use the same traversal method for other uses, and not have to pass the same data around constantly. This implementation is shown below.
def visit_rec_gen(self, node):
if node:
for n in self.visit_rec_gen(node.left):
yield n
for n in self.visit_rec_gen(node.right):
yield n
yield node
for node in self.visit_rec_gen():
node.do_stuff(data)
However, this was far slower than the previous version (~50s to ~17s) and used far more memory. Have I made a mistake in my generator function version? I'd prefer to use this method but not at the expense of performance.
EDIT: Something I should have mentioned initially was that these results were obtained under PyPy 2.3.1, rather than standard CPython.
Thus, generator expressions are faster than list comprehension and hence time efficient.
A Python generator is a function that produces a sequence of results. It works by maintaining its local state, so that the function can resume again exactly where it left off when called subsequent times. Thus, you can think of a generator as something like a powerful iterator.
The Yield keyword in Python is similar to a return statement used for returning values or objects in Python. However, there is a slight difference. The yield statement returns a generator object to the one who calls the function which contains yield, instead of simply returning a value.
On PyPy, function calls are much more highly optimized than generators or iterators.
There are many things that have different performance characteristics in PyPy (for example, PyPy's itertools.islice() performs abyssmally).
You're doing the right thing by measuring the performance to see which way is fastest.
Also note PyPy has tools to show the code that is generated so you get a more detailed answer to the question "what does it do". Of course, the question of "why does it do that" has a human component in the answer that involves what was convenient to implement or the proclivities of the implementers.
If you're using python3.3, the yield from
statement is optimized to be faster than iterating for the purpose of yielding:
def visit_rec_gen(self, node):
if node:
yield from self.visit_rec_gen(node.left)
yield from self.visit_rec_gen(node.right)
yield node
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With