Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

izip_longest in itertools: what's going on here?

I'm struggeling to understand how the below code works. It's from http://docs.python.org/library/itertools.html#itertools.izip_longest, and is the pure-python equivalent of the izip_longest iterator. I'm especially mystified by the sentinel function, how does it work?

def izip_longest(*args, **kwds):
    # izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')
    def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
        yield counter()         # yields the fillvalue, or raises IndexError
    fillers = repeat(fillvalue)
    iters = [chain(it, sentinel(), fillers) for it in args]
    try:
        for tup in izip(*iters):
            yield tup
    except IndexError:
        pass
like image 258
Lauritz V. Thaulow Avatar asked Mar 14 '11 12:03

Lauritz V. Thaulow


3 Answers

Ok, we can do this. About the sentinel. The expression ([fillvalue]*(len(args)-1)) creates a list that contains one fill value for each iterable in args, minus one. So, for the example above ['-']. counter is then assigned the pop-function of that list. sentinel itself is a generator that pops one item from that list on each iteration. You can iterate over each iterator returned by sentinel exactly once, and it will always yield fillvalue. The total number of items yielded by all iterators returned by sentinel is len(args) - 1 (thanks to Sven Marnach for clarifying that, I misunderstood it).

Now check out this:

iters = [chain(it, sentinel(), fillers) for it in args]

That is the trick. iters is a list that contains an iterator for each iterable in args. Each of these iterators does the following:

  1. Iterate over all items in the corresponding iterable from args.
  2. Iterate over sentinel once, yielding fillvalue.
  3. Repeat fillvalue for all eternity.

Now, as established earlier, we can only iterate over all sentinels together len(args)-1 times before it throws an IndexError. This is fine, because one of the iterables is the longest. So, when we come to the point that the IndexError is raised, that means we have finished iterating over the longest iterable in args.

You are welcome.

P.S.: I hope this is understandable.

like image 173
Björn Pollex Avatar answered Nov 11 '22 14:11

Björn Pollex


The function sentinel() returns iterators yielding fillvalue exactly once. The total number of fillvalues yielded by all iterators returned by sentinel() is limited to n-1, where n is the number of iterators passed to izip_longest(). After this number of fillvalues has been exhausted, further iteration over an iterator returned by sentinel() will raise an IndexError.

This function is used to detect whether all iterators have been exhausted: Each iterator is chain()ed with an iterator returned by sentinel(). If all iterators are exhausted, an iterator returned by sentinel() will get iterated over for the nth time, resulting in an IndexError, triggering the end of izip_longest() in turn.

So far I explained what sentinel() does, not how it works. When izip_longest() is called, the definition of sentinel() is evaluated. While evaluating the definition, also the default argument to sentinel() is evaluated, once per call to izip_longest(). The code is equivalent to

fillvalue_list = [fillvalue] * (len(args)-1)
def sentinel():
    yield fillvalue_list.pop()

Storing this in a default argument instead of in a variable in an enclosing scope is just an optimisation, as is the inclusion of .pop in the default argument, since it save looking it up every time an iterator returned by sentinel() is iterated over.

like image 5
Sven Marnach Avatar answered Nov 11 '22 14:11

Sven Marnach


The definition of sentinel is almost equivalent to

def sentinel():
    yield ([fillvalue] * (len(args) - 1)).pop()

except that it gets the pop bound method (a function object) as a default argument. A default argument is evaluated at the time of function definition, so once per call to izip_longest instead of once per call to sentinel. Therefore, the function object "remembers" the list [fillvalue] * (len(args) - 1) instead of constructing this anew in every call.

like image 2
Fred Foo Avatar answered Nov 11 '22 12:11

Fred Foo