Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ensure that an argument can be iterated twice

Tags:

python

Suppose I have the following function:

def print_twice(x):
    for i in x: print(i)
    for i in x: print(i)

When I run:

print_twice([1,2,3])

or:

print_twice((1,2,3))

I get the expected result: the numbers 1,2,3 are printed twice.

But when I run:

print_twice(zip([1,2,3],[4,5,6]))

the pairs (1,4),(2,5),(3,6) are printed only once. Probably, this is because the zip returns a generator that terminates after one pass.

How can I modify the function print_twice such that it will correctly handle all inputs?

I could insert a line at the beginning of the function: x = list(x). But this might be inefficient in case x is already a list, a tuple, a range, or any other iterator that can be iterated more than once. Is there a more efficient solution?

like image 415
Erel Segal-Halevi Avatar asked Dec 16 '21 15:12

Erel Segal-Halevi


People also ask

Can you use an iterator twice?

Iterators can generally not be iterated twice because there might be a cost to their iteration. In the case of str::lines , each iteration needs to find the next end of line, which means scanning through the string, which has some cost.

What does iterate mean in Python?

An iterator is an object that contains a countable number of values. An iterator is an object that can be iterated upon, meaning that you can traverse through all the values. Technically, in Python, an iterator is an object which implements the iterator protocol, which consist of the methods __iter__() and __next__() .


Video Answer


3 Answers

A simple test to see if x will be consumed when you iterate over it is iter(x) is x. This is reliable, since it's specified as part of the iterator protocol (docs):

Iterators are required to have an __iter__() method that returns the iterator object itself

Conversely, if iter(x) returns x itself then x must be an iterator, since it was returned by the iter function.

Some checks:

def is_iterator(x):
    return iter(x) is x

for obj in [
    # not iterators
    [1, 2, 3],
    (1, 2, 3),
    {1: 2, 3: 4},
    range(3),
    # iterators
    (x for x in range(3)),
    iter([1, 2, 3]),
    zip([1, 2], [3, 4]),
    filter(lambda x: x % 2 == 0, [1, 2, 3]),
    map(lambda x: 2 * x, [1, 2, 3]),
]:
    name = type(obj).__name__
    if is_iterator(obj):
        print(name, 'is an iterator')
    else:
        print(name, 'is not an iterator')

Results:

list is not an iterator
tuple is not an iterator
dict is not an iterator
range is not an iterator
generator is an iterator
list_iterator is an iterator
zip is an iterator
filter is an iterator
map is an iterator

So, to ensure that x can be iterated multiple times, without making an unnecessary copy if it already can be, you can write something like:

if iter(x) is x:
    x = list(x)
like image 85
kaya3 Avatar answered Oct 20 '22 09:10

kaya3


I could insert a line at the beginning of the function: x = list(x). But this might be inefficient in case x is already a list, a tuple, a range, or any other iterator that can be iterated more than once. Is there a more efficient solution?

Copying single-use iterables to a list is perfectly adequate, and reasonably efficient even for multi-use iterables.

The list (and to some extend tuple) type is one of the most optimised data structures in Python. Common operations such as copying a list or tuple to a list are internally optimised;1 even for iterables that are not special-cased, copying them to a list is significantly faster than any realistic work done by two (or more) loops.

def print_twice(x):
    x = list(x)
    for i in x: print(i)
    for i in x: print(i)

Copying indiscriminately can also be advantageous in the context of concurrency, when the iterable may be modified while the function is running. Common cases are threading and weakref collections.


In case one wants to avoid needless copies, checking whether the iterable is a Collection is a reasonable guard.

from collections.abc import Collection

x = list(x) if not isinstance(x, Collection) else x

Alternatively, one can check whether the iterable is in fact an iterator, since this implies statefulness and thus single-use.

from collections.abc import Iterator

x = list(x) if isinstance(x, Iterator) else x
x = list(x) if iter(x) is x else x

Notably, the builtins zip, filter, map, ... and generators all are iterators.


1Copying a list of 128 items is roughly as fast as checking whether it is a Collection.

like image 42
MisterMiyagi Avatar answered Oct 20 '22 10:10

MisterMiyagi


zip will return an iterator. Once unpacked, it cannot be unpacked again, it gets exhausted.

Maybe if you want to make sure that only zip objects get converted to list as you said it would work but it would not be efficient, you can check for it type:

if isinstance(x, zip):
  x = list(x)
like image 36
Alexandru DuDu Avatar answered Oct 20 '22 09:10

Alexandru DuDu