Zipped Python generators with 2nd one being shorter: how to retrieve element that is silently consumed

Tags:

I want to parse 2 generators of (potentially) different length with zip:

for el1, el2 in zip(gen1, gen2):     print(el1, el2)

However, if gen2 has less elements, one extra element of gen1 is "consumed".

For example,

def my_gen(n:int):     for i in range(n):         yield i  gen1 = my_gen(10) gen2 = my_gen(8)  list(zip(gen1, gen2))  # Last tuple is (7, 7) print(next(gen1))  # printed value is "9" => 8 is missing  gen1 = my_gen(8) gen2 = my_gen(10)  list(zip(gen1, gen2))  # Last tuple is (7, 7) print(next(gen2))  # printed value is "8" => OK

Apparently, a value is missing (8 in my previous example) because gen1 is read (thus generating the value 8) before it realizes gen2 has no more elements. But this value disappears in the universe. When gen2 is "longer", there is no such "problem".

QUESTION: Is there a way to retrieve this missing value (i.e. 8 in my previous example)? ... ideally with a variable number of arguments (like zip does).

NOTE: I have currently implemented in another way by using itertools.zip_longest but I really wonder how to get this missing value using zip or equivalent.

NOTE 2: I have created some tests of the different implementations in this REPL in case you want to submit and try a new implementation :) https://repl.it/@jfthuong/MadPhysicistChester

960

asked Apr 09 '20 16:04

Jean-Francois T.

2 Answers

Right out of the box, zip() is hardwired to dispose of the unmatched item. So, you need a way to remember values before they get consumed.

The itertool called tee() was designed for this purpose. You can use it to create a "shadow" of the first input iterator. If the second iterator terminates, you can fetch first iterator's value from the shadow iterator.

Here's one way to do it that uses existing tooling, that runs at C-speed, and that is memory efficient:

>>> from itertools import tee >>> from operator import itemgetter  >>> iterable1, iterable2 = 'abcde', 'xyz'   >>> it1, shadow1 = tee(iterable1) >>> it2 = iter(iterable2) >>> combined = map(itemgetter(0, 1), zip(it1, it2, shadow1))   >>> list(combined) [('a', 'x'), ('b', 'y'), ('c', 'z')] >>> next(shadow1) 'd'

195

answered Oct 23 '22 03:10

Raymond Hettinger

One way would be to implement a generator that lets you cache the last value:

class cache_last(collections.abc.Iterator):     """     Wraps an iterable in an iterator that can retrieve the last value.      .. attribute:: obj         A reference to the wrapped iterable. Provided for convenience        of one-line initializations.     """     def __init__(self, iterable):         self.obj = iterable         self._iter = iter(iterable)         self._sentinel = object()      @property     def last(self):         """         The last object yielded by the wrapped iterator.          Uninitialized iterators raise a `ValueError`. Exhausted         iterators raise a `StopIteration`.         """         if self.exhausted:             raise StopIteration         return self._last      @property     def exhausted(self):         """         `True` if there are no more elements in the iterator.         Violates EAFP, but convenient way to check if `last` is valid.         Raise a `ValueError` if the iterator is not yet started.         """         if not hasattr(self, '_last'):             raise ValueError('Not started!')         return self._last is self._sentinel      def __next__(self):         """         Retrieve, record, and return the next value of the iteration.         """         try:             self._last = next(self._iter)         except StopIteration:             self._last = self._sentinel             raise         # An alternative that has fewer lines of code, but checks         # for the return value one extra time, and loses the underlying         # StopIteration:         #self._last = next(self._iter, self._sentinel)         #if self._last is self._sentinel:         #    raise StopIteration         return self._last      def __iter__(self):         """         This object is already an iterator.         """         return self

To use this, wrap the inputs to zip:

gen1 = cache_last(range(10)) gen2 = iter(range(8)) list(zip(gen1, gen2)) print(gen1.last) print(next(gen1))

It is important to make gen2 an iterator rather than an iterable, so you can know which one was exhausted. If gen2 is exhausted, you don't need to check gen1.last.

Another approach would be to override zip to accept a mutable sequence of iterables instead of separate iterables. That would allow you to replace iterables with a chained version that includes your "peeked" item:

def myzip(iterables):     iterators = [iter(it) for it in iterables]     while True:         items = []         for it in iterators:             try:                 items.append(next(it))             except StopIteration:                 for i, peeked in enumerate(items):                     iterables[i] = itertools.chain([peeked], iterators[i])                 return             else:                 yield tuple(items)  gens = [range(10), range(8)] list(myzip(gens)) print(next(gens[0]))

This approach is problematic for many reasons. Not only will it lose the original iterable, but it will lose any of the useful properties the original object may have had by replacing it with a chain object.

answered Oct 23 '22 03:10

Mad Physicist

Related questions
                            
                                unable to create autoincrementing primary key with flask-sqlalchemy
                            
                                Interactive matplotlib plot with two sliders
                            
                                How to check if a string is a valid regex in Python?
                            
                                How do I write a null (no-op) contextmanager in Python?
                            
                                How to return custom JSON in Django REST Framework
                            
                                How to yield results from a nested generator function?
                            
                                What exactly are "containers" in python? (And what are all the python container types?)
                            
                                Skip over a value in the range function in python
                            
                                Python: Wait on all of `concurrent.futures.ThreadPoolExecutor`'s futures
                            
                                How to make an object properly hashable?
                            
                                A better way for a Python 'for' loop
                            
                                Add another tuple to a tuple of tuples
                            
                                In Django models.py, what's the difference between default, null, and blank?
                            
                                open file in "w" mode: IOError: [Errno 2] No such file or directory
                            
                                How does python find a module file if the import statement only contains the filename?
                            
                                multiprocessing: map vs map_async
                            
                                How is HDF5 different from a folder with files?
                            
                                Why does Python assignment not return a value?
                            
                                How to install R packages that are not available in "R-essentials"?
                            
                                Pycharm: "scanning files to index" is taking forever

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Zipped Python generators with 2nd one being shorter: how to retrieve element that is silently consumed

Tags:

python

generator

python-3.x

zip

itertools

Jean-Francois T.

People also ask

2 Answers

Raymond Hettinger

Mad Physicist

Recent Activity

Donate For Us