Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the time complexity of tree traversal using yield from?

An example of depth-first tree traversal:

class Node:
    def __init__(self, value):
        self._value = value
        self._children = []

    def add_child(self, child):
        self._children.append(child)

    def __iter__(self):
        return iter(self._children)

    def depth_first(self):
        yield self
        for c in self:
            yield from c.depth_first()

I understand that yield from does not consume the generator right away, but instead pass the yield upward to its caller.

But this process of passing still exists, and thus yield will be passed from every node all the way up to its root, and we can describe the running time by the recurrence (assume it is a binary tree for simplicity, but the idea is the same):

T(n) = 2*T(n/2) + Θ(n)

Θ(n) exists because this node has to pass all the yield passed from its offsprings to its parent. And the time complexity derived from the formula above is:

O(nlogn)

However, the time complexity of tree traversal is only O(n) if I do not use yield or yield from at all.

I am wondering whether I misunderstand how yield works or it is simply not feasible to write a recursive generator like this.

like image 869
Justin Li Avatar asked Dec 10 '16 19:12

Justin Li


1 Answers

From the official Python 3.3 release for yield from: https://www.python.org/dev/peps/pep-0380/

Using a specialised syntax opens up possibilities for optimisation when there is a long chain of generators. Such chains can arise, for instance, when recursively traversing a tree structure. The overhead of passing next() calls and yielded values down and up the chain can cause what ought to be an O(n) operation to become, in the worst case, O(n**2).

A possible strategy is to add a slot to generator objects to hold a generator being delegated to. When a next() or send() call is made on the generator, this slot is checked first, and if it is nonempty, the generator that it references is resumed instead. If it raises StopIteration, the slot is cleared and the main generator is resumed.

This would reduce the delegation overhead to a chain of C function calls involving no Python code execution. A possible enhancement would be to traverse the whole chain of generators in a loop and directly resume the one at the end, although the handling of StopIteration is more complicated then.

It looks like yield from still requires traversing up the tree. But that traversing is done by the interpreter in C instead of in Python. So technically it's still a O(n) overhead, but it's not as bad as it sounds.

like image 152
Omada Avatar answered Nov 18 '22 06:11

Omada