Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Issue with Python3's built-in zip function

Tags:

python

zip

Python 3.4.2 (default, Oct  8 2014, 13:44:52) 
[GCC 4.9.1 20140903 (prerelease)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> gen = (x for x in range(10)) ## Need to wrap range into ()'s to create a generator, next(range(10)) is invalid
>>> list(zip(gen, [1,2,3])) ## zip will "eat up" the number 3
[(0, 1), (1, 2), (2, 3)]
>>> next(gen) ## Here i need next to return 3
4
>>> 

The problem is that I'm losing a value after the zip call. This would be a bigger issue had it not been for the fact that gen is pure code.

I don't know whether or not it would be possible to create a function that behaves like this, it's definitely possible if only one of the arguments to the zip function is a generator and the rest are "normal" iterators where all the values are known, and stored in memory. If that were the case you could just check the generator last.

Basically what I am wondering is if there is any function in the python standard library that will act like I need it to in this case.

Of course, in some cases one could just do something like

xs = list(gen)

Then you only have to deal with a list.

I could also add, that getting the last value that zip got from gen would also be a solution to this problem.

like image 228
A_User Avatar asked Oct 23 '14 16:10

A_User


2 Answers

No, there are no built-in functions that avoid this behaviour.

What happens is that the zip() function tries to get the next value for all inputs, so that it can produce the next tuple. It has to do this in a order, and it is only logical that that order is the same as the arguments passed in. In fact, the order is guaranteed by the documentation:

The left-to-right evaluation order of the iterables is guaranteed

Because the function needs to support arbitrary iterables, zip() does not make any attempts at determining the length of all parameters. It doesn't know that your second parameter only has 3 elements. It simply tries to get the next value for each of the parameters, builds a tuple and returns that. If any of the parameters cannot produce a next value, the zip() iterator is done. But that does mean that it'll ask your generator for the next element first before asking the list.

Apart from altering the order of your inputs, you can build your own zip() function instead, that does try to take length into account, where available:

def limited_zip(*iterables):
    minlength = float('inf')
    for it in iterables:
        try:
            if len(it) < minlength:
                minlength = len(it)
        except TypeError:
            pass
    iterators = [iter(it) for it in iterables]
    count = 0
    while iterators and count < minlength:
        yield tuple(map(next, iterators))
        count += 1

So this version of the zip() function tries to get a bead on the minimal length of any sequences you passed in. This does not protect you from using a shorter iterable in the mix, but does work for your test case:

Demo:

>>> gen = iter(range(10))
>>> list(limited_zip(gen, [1, 2, 3]))
[(0, 1), (1, 2), (2, 3)]
>>> next(gen)
3
like image 116
Martijn Pieters Avatar answered Nov 19 '22 21:11

Martijn Pieters


Problem is that zip(gen,[1,2,3]) generates 0,1,2, and 3 also but finds that second argument is of length three only. So if you do it in reverse, you can generate 3 in next(gen) code line:

>>> gen = (x for x in range(10))
>>> list(zip([1,2,3],gen))
[(1, 0), (2, 1), (3, 2)]
>>> next(gen)
3
like image 2
Irshad Bhat Avatar answered Nov 19 '22 19:11

Irshad Bhat