Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

zip_longest without fillvalue

I am searching for a middle ground between Python's zip and zip_longest functions (from the itertools module), that exhausts all given iterators, but does not fill in anything. So, for example, it should transpose tuples like so:

(11, 12, 13    ),        (11, 21, 31, 41),
(21, 22, 23, 24),  -->   (12, 22, 32, 42),
(31, 32        ),        (13, 23,     43),
(41, 42, 43, 44),        (    24,     44)

(Spaces added for nicer graphical alignment.)

I managed to compose a crude a solution by cleaning out the fillvalues after zip_longest.

def zip_discard(*iterables, sentinel = object()):
    return map(
            partial(filter, partial(is_not, sentinel)), 
            zip_longest(*iterables, fillvalue=sentinel))

Is there a way to do this without introducing the sentinels to begin with? Can this be improved using yield? Which approach seems most efficient?

like image 751
XZS Avatar asked Jun 27 '16 12:06

XZS


2 Answers

Both zip and zip_longest were designed to always generate tuples of equal length, you can define your own generator that doesn't care about the len with something like this:

def _one_pass(iters):
    for it in iters:
        try:
            yield next(it)
        except StopIteration:
            pass #of some of them are already exhausted then ignore it.

def zip_varlen(*iterables):
    iters = [iter(it) for it in iterables]
    while True: #broken when an empty tuple is given by _one_pass
        val = tuple(_one_pass(iters))
        if val:
            yield val
        else:
            break

If the data being zipped together is fairly large then skipping the exhausted iterators every time can be expensive, it may be more efficient to remove the finished iterators from iters in the _one_pass function like this:

def _one_pass(iters):
    i = 0
    while i<len(iters):
        try:
            yield next(iters[i])
        except StopIteration:
            del iters[i]
        else:
            i+=1

both of these versions would remove the need to create intermediate results or use temporary filler values.

like image 77
Tadhg McDonald-Jensen Avatar answered Oct 02 '22 09:10

Tadhg McDonald-Jensen


You approach is good. I think using a sentinel is elegant. I might be considered a bit more pythonic to use a nested generator expression:

def zip_discard_gen(*iterables, sentinel=object()):
    return ((entry for entry in iterable if entry is not sentinel)
            for iterable in zip_longest(*iterables, fillvalue=sentinel))

This needs fewer imports because there is no need for partial() or ne().

It is a bit faster too:

data = [(11, 12, 13    ),
        (21, 22, 23, 24),
        (31, 32        ),
        (41, 42, 43, 44)]

%timeit [list(x) for x in zip_discard(*data)]  
10000 loops, best of 3: 17.5 µs per loop

%timeit [list(x) for x in zip_discard_gen(*data)]
100000 loops, best of 3: 14.2 µs per loop

EDIT

A list comprehension version is a bit faster:

def zip_discard_compr(*iterables, sentinel=object()):
    return [[entry for entry in iterable if entry is not sentinel]
            for iterable in zip_longest(*iterables, fillvalue=sentinel)]

Timing:

%timeit zip_discard_compr(*data)
100000 loops, best of 3: 6.73 µs per loop

A Python 2 version:

from itertools import izip_longest

SENTINEL = object()

def zip_discard_compr(*iterables):
    sentinel = SENTINEL
    return [[entry for entry in iterable if entry is not sentinel]
            for iterable in izip_longest(*iterables, fillvalue=sentinel)]

Timings

This version returns the same data structure as zip_varlen by Tadhg McDonald-Jensen:

def zip_discard_gen(*iterables, sentinel=object()):
    return (tuple([entry for entry in iterable if entry is not sentinel])
            for iterable in zip_longest(*iterables, fillvalue=sentinel))

It is about twice as fast:

%timeit list(zip_discard_gen(*data))
100000 loops, best of 3: 9.37 µs per loop

%timeit list(zip_varlen(*data))
10000 loops, best of 3: 18 µs per loop
like image 24
Mike Müller Avatar answered Oct 02 '22 10:10

Mike Müller