I am curious what the fastest way to consume an iterator would be, and the most Pythonic way.
For example, say that I want to create an iterator with the map
builtin that accumulates something as a side-effect. I don't actually care about the result of the map
, just the side effect, so I want to blow through the iteration with as little overhead or boilerplate as possible. Something like:
my_set = set() my_map = map(lambda x, y: my_set.add((x, y)), my_x, my_y)
In this example, I just want to blow through the iterator to accumulate things in my_set
, and my_set
is just an empty set until I actually run through my_map
. Something like:
for _ in my_map: pass
or a naked
[_ for _ in my_map]
works, but they both feel clunky. Is there a more Pythonic way to make sure an iterator iterates quickly so that you can benefit from some side-effect?
I tested the two methods above on the following:
my_x = np.random.randint(100, size=int(1e6)) my_y = np.random.randint(100, size=int(1e6))
with my_set
and my_map
as defined above. I got the following results with timeit:
for _ in my_map: pass 468 ms ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) [_ for _ in my_map] 476 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
No real difference between the two, and they both feel clunky.
Note, I got similar performance with list(my_map)
, which was a suggestion in the comments.
Iterator and for-each loop are faster than simple for loop for collections with no random access, while in collections which allows random access there is no performance change with for-each loop/for loop/iterator.
Iterators will be faster and have better memory efficiency. Just think of an example of range(1000) vs xrange(1000) .
Technically speaking, a Python iterator object must implement two special methods, __iter__() and __next__() , collectively called the iterator protocol. An object is called iterable if we can get an iterator from it.
While you shouldn't be creating a map object just for side effects, there is in fact a standard recipe for consuming iterators in the itertools
docs:
def consume(iterator, n=None): "Advance the iterator n-steps ahead. If n is None, consume entirely." # Use functions that consume iterators at C speed. if n is None: # feed the entire iterator into a zero-length deque collections.deque(iterator, maxlen=0) else: # advance to the empty slice starting at position n next(islice(iterator, n, n), None)
For just the "consume entirely" case, this can be simplified to
def consume(iterator): collections.deque(iterator, maxlen=0)
Using collections.deque
this way avoids storing all the elements (because maxlen=0
) and iterates at C speed, without bytecode interpretation overhead. There's even a dedicated fast path in the deque implementation for using a maxlen=0
deque to consume an iterator.
Timing:
In [1]: import collections In [2]: x = range(1000) In [3]: %%timeit ...: i = iter(x) ...: for _ in i: ...: pass ...: 16.5 µs ± 829 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [4]: %%timeit ...: i = iter(x) ...: collections.deque(i, maxlen=0) ...: 12 µs ± 566 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Of course, this is all based on CPython. The entire nature of interpreter overhead is very different on other Python implementations, and the maxlen=0
fast path is specific to CPython. See abarnert's answer for other Python implementations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With