Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python itertools round robin explaintation

I came across this code snippet for round robin in https://more-itertools.readthedocs.io/en/latest/api.html, but I couldn't understand how the last line nexts = cycle(islice(nexts, pending)) removes the exhausted generator from nexts iterator. To my understanding, this removes the last one from input but shouldn't we remove the first one, i.e. the current one that throws this exception?

def roundrobin(*iterables):
    """Yields an item from each iterable, alternating between them.

        >>> list(roundrobin('ABC', 'D', 'EF'))
        ['A', 'D', 'E', 'B', 'F', 'C']

    This function produces the same output as :func:`interleave_longest`, but
    may perform better for some inputs (in particular when the number of
    iterables is small).

    """
    # Recipe credited to George Sakkis
    pending = len(iterables)
    if PY2:
        nexts = cycle(iter(it).next for it in iterables)
    else:
        nexts = cycle(iter(it).__next__ for it in iterables)
    while pending:
        try:
            for next in nexts:
                yield next()
        except StopIteration:
            pending -= 1
            nexts = cycle(islice(nexts, pending))
like image 307
xuanyue Avatar asked Jun 18 '18 00:06

xuanyue


2 Answers

I'll break it up for you, step by step:

>>> list(roundrobin('ABC', 'D', 'EF'))


pending = 3  # len(('ABC', 'D', 'EF'))

# nexts roughly equivalent to:
['ABC', 'D', 'EF', 'ABC', 'D', 'EF', 'ABC', ...]

***

# the following get yielded 
'A' from 'ABC' # nexts: ['D' , 'EF', 'BC', 'D' , 'EF', ...]
'D' from 'D'   # nexts: ['EF', 'BC', ''  , 'EF', 'BC', ...]
'E' from 'EF'  # nexts: ['BC', ''  , 'F' , 'BC', ''  , ...]
'B' from 'BC'  # nexts: [''  , 'F' , 'C' , ''  , 'F' , ...]

# StopIteration was raised by what used to be "B"
SI from ''  # nexts:  ['F', 'C', '', 'F', 'C', '', 'F', ...]
#                                 ^ index 2 (`pending`[see the following line])

# pending -= 1
pending = 2

# when islice() is used with one argument, it defaults to the "stop" index
# so islice() returns ['F', 'C']
# and cycle() converts it to ['F', 'C', 'F', 'C', ...]

pending = 2
nexts: ['F', 'C', 'F', 'C', ...]

# Go back to *** and continue until pending = 0

And that's how the last line removes the exhausted iterable.


Main idea:

for next in nexts:
    yield next()

Even if StopIteration is raised, the exhausted iterable had already been used from nexts.

So instead of the exhausted iterable remaining as the first item in nexts:

nexts = ['', 'F', 'C', '', 'F', 'C', '', 'F', ...]

The first item of nexts is actually the next item:

nexts = ['F', 'C', '', 'F', 'C', '', 'F', ...]
like image 187
Taku Avatar answered Oct 05 '22 23:10

Taku


Because cycle() creates an infinite iterator, you are not always slicing from the "beginning" of iterables when islice() is used in the final line. To simplify a bit, nexts as it is originally defined will "loop back" to the beginning of iterables:

try:
    for next in nexts:
        print(next())
except StopIteration:
    print('decrementing...')
A
D
E
B
decrementing...

The goal at this point is to remove the exhausted iterator that was formed from 'D'. The key is that you're now "at" 'EF', with the iterator that was formed from 'ABC' being next in line, so islice(nexts, pending) will cycle from 'EF' to 'ABC'.

Here is a related example:

x = cycle('abcd')
#          ^  |
#          |__|

for _ in range(3):
    x.__next__()

list(islice(x, 2))
# ['d', 'a']
like image 32
Brad Solomon Avatar answered Oct 05 '22 23:10

Brad Solomon