Suppose I have the following function:
def print_twice(x):
for i in x: print(i)
for i in x: print(i)
When I run:
print_twice([1,2,3])
or:
print_twice((1,2,3))
I get the expected result: the numbers 1,2,3 are printed twice.
But when I run:
print_twice(zip([1,2,3],[4,5,6]))
the pairs (1,4),(2,5),(3,6) are printed only once. Probably, this is because the zip
returns a generator that terminates after one pass.
How can I modify the function print_twice
such that it will correctly handle all inputs?
I could insert a line at the beginning of the function: x = list(x)
. But this might be inefficient in case x is already a list, a tuple, a range, or any other iterator that can be iterated more than once. Is there a more efficient solution?
Iterators can generally not be iterated twice because there might be a cost to their iteration. In the case of str::lines , each iteration needs to find the next end of line, which means scanning through the string, which has some cost.
An iterator is an object that contains a countable number of values. An iterator is an object that can be iterated upon, meaning that you can traverse through all the values. Technically, in Python, an iterator is an object which implements the iterator protocol, which consist of the methods __iter__() and __next__() .
A simple test to see if x
will be consumed when you iterate over it is iter(x) is x
. This is reliable, since it's specified as part of the iterator protocol (docs):
Iterators are required to have an
__iter__()
method that returns the iterator object itself
Conversely, if iter(x)
returns x
itself then x
must be an iterator, since it was returned by the iter
function.
Some checks:
def is_iterator(x):
return iter(x) is x
for obj in [
# not iterators
[1, 2, 3],
(1, 2, 3),
{1: 2, 3: 4},
range(3),
# iterators
(x for x in range(3)),
iter([1, 2, 3]),
zip([1, 2], [3, 4]),
filter(lambda x: x % 2 == 0, [1, 2, 3]),
map(lambda x: 2 * x, [1, 2, 3]),
]:
name = type(obj).__name__
if is_iterator(obj):
print(name, 'is an iterator')
else:
print(name, 'is not an iterator')
Results:
list is not an iterator
tuple is not an iterator
dict is not an iterator
range is not an iterator
generator is an iterator
list_iterator is an iterator
zip is an iterator
filter is an iterator
map is an iterator
So, to ensure that x
can be iterated multiple times, without making an unnecessary copy if it already can be, you can write something like:
if iter(x) is x:
x = list(x)
I could insert a line at the beginning of the function:
x = list(x)
. But this might be inefficient in case x is already a list, a tuple, a range, or any other iterator that can be iterated more than once. Is there a more efficient solution?
Copying single-use iterables to a list
is perfectly adequate, and reasonably efficient even for multi-use iterables.
The list
(and to some extend tuple
) type is one of the most optimised data structures in Python. Common operations such as copying a list
or tuple
to a list
are internally optimised;1 even for iterables that are not special-cased, copying them to a list
is significantly faster than any realistic work done by two (or more) loops.
def print_twice(x):
x = list(x)
for i in x: print(i)
for i in x: print(i)
Copying indiscriminately can also be advantageous in the context of concurrency, when the iterable may be modified while the function is running. Common cases are threading and weakref
collections.
In case one wants to avoid needless copies, checking whether the iterable is a Collection
is a reasonable guard.
from collections.abc import Collection
x = list(x) if not isinstance(x, Collection) else x
Alternatively, one can check whether the iterable is in fact an iterator, since this implies statefulness and thus single-use.
from collections.abc import Iterator
x = list(x) if isinstance(x, Iterator) else x
x = list(x) if iter(x) is x else x
Notably, the builtins zip
, filter
, map
, ... and generators all are iterators.
1Copying a list
of 128 items is roughly as fast as checking whether it is a Collection
.
zip
will return an iterator. Once unpacked, it cannot be unpacked again, it gets exhausted.
Maybe if you want to make sure that only zip
objects get converted to list
as you said it would work but it would not be efficient, you can check for it type:
if isinstance(x, zip):
x = list(x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With