I am curious what the fastest way to consume an iterator would be, and the most Pythonic way. For example, say that I want to create an iterator with the <code>map</code> builtin that accumulates something as a side-effect. I don't actually care about the result of the <code>map</code>, just the side effect, so I want to blow through the iteration with as little overhead or boilerplate as possible. Something like: <pre class="prettyprint"><code>my_set = set() my_map = map(lambda x, y: my_set.add((x, y)), my_x, my_y) </code></pre> In this example, I just want to blow through the iterator to accumulate things in <code>my_set</code>, and <code>my_set</code> is just an empty set until I actually run through <code>my_map</code>. Something like: <pre class="prettyprint"><code>for _ in my_map: pass </code></pre> or a naked <pre class="prettyprint"><code>[_ for _ in my_map] </code></pre> works, but they both feel clunky. Is there a more Pythonic way to make sure an iterator iterates quickly so that you can benefit from some side-effect? <hr> <h3>Benchmark</h3> I tested the two methods above on the following: <pre class="prettyprint"><code>my_x = np.random.randint(100, size=int(1e6)) my_y = np.random.randint(100, size=int(1e6)) </code></pre> with <code>my_set</code> and <code>my_map</code> as defined above. I got the following results with timeit: <pre class="prettyprint"><code>for _ in my_map: pass 468 ms ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) [_ for _ in my_map] 476 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) </code></pre> No real difference between the two, and they both feel clunky. Note, I got similar performance with <code>list(my_map)</code>, which was a suggestion in the comments.

While you shouldn't be creating a map object just for side effects, there is in fact a standard recipe for consuming iterators in the <code>itertools</code> docs: <pre class="prettyprint"><code>def consume(iterator, n=None): "Advance the iterator n-steps ahead. If n is None, consume entirely." # Use functions that consume iterators at C speed. if n is None: # feed the entire iterator into a zero-length deque collections.deque(iterator, maxlen=0) else: # advance to the empty slice starting at position n next(islice(iterator, n, n), None) </code></pre> For just the "consume entirely" case, this can be simplified to <pre class="prettyprint"><code>def consume(iterator): collections.deque(iterator, maxlen=0) </code></pre> Using <code>collections.deque</code> this way avoids storing all the elements (because <code>maxlen=0</code>) and iterates at C speed, without bytecode interpretation overhead. There's even a dedicated fast path in the deque implementation for using a <code>maxlen=0</code> deque to consume an iterator. Timing: <pre class="prettyprint"><code>In [1]: import collections In [2]: x = range(1000) In [3]: %%timeit ...: i = iter(x) ...: for _ in i: ...: pass ...: 16.5 µs ± 829 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [4]: %%timeit ...: i = iter(x) ...: collections.deque(i, maxlen=0) ...: 12 µs ± 566 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) </code></pre> Of course, this is all based on CPython. The entire nature of interpreter overhead is very different on other Python implementations, and the <code>maxlen=0</code> fast path is specific to CPython. See abarnert's answer for other Python implementations.

Fastest (most Pythonic) way to consume an iterator

Tags:

I am curious what the fastest way to consume an iterator would be, and the most Pythonic way.

For example, say that I want to create an iterator with the map builtin that accumulates something as a side-effect. I don't actually care about the result of the map, just the side effect, so I want to blow through the iteration with as little overhead or boilerplate as possible. Something like:

my_set = set() my_map = map(lambda x, y: my_set.add((x, y)), my_x, my_y)

In this example, I just want to blow through the iterator to accumulate things in my_set, and my_set is just an empty set until I actually run through my_map. Something like:

for _ in my_map:     pass

or a naked

[_ for _ in my_map]

works, but they both feel clunky. Is there a more Pythonic way to make sure an iterator iterates quickly so that you can benefit from some side-effect?

Benchmark

I tested the two methods above on the following:

my_x = np.random.randint(100, size=int(1e6)) my_y = np.random.randint(100, size=int(1e6))

with my_set and my_map as defined above. I got the following results with timeit:

for _ in my_map:     pass 468 ms ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  [_ for _ in my_map] 476 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

No real difference between the two, and they both feel clunky.

Note, I got similar performance with list(my_map), which was a suggestion in the comments.

629

asked Jun 19 '18 22:06

Engineero

1 Answers

While you shouldn't be creating a map object just for side effects, there is in fact a standard recipe for consuming iterators in the itertools docs:

def consume(iterator, n=None):     "Advance the iterator n-steps ahead. If n is None, consume entirely."     # Use functions that consume iterators at C speed.     if n is None:         # feed the entire iterator into a zero-length deque         collections.deque(iterator, maxlen=0)     else:         # advance to the empty slice starting at position n         next(islice(iterator, n, n), None)

For just the "consume entirely" case, this can be simplified to

def consume(iterator):     collections.deque(iterator, maxlen=0)

Using collections.deque this way avoids storing all the elements (because maxlen=0) and iterates at C speed, without bytecode interpretation overhead. There's even a dedicated fast path in the deque implementation for using a maxlen=0 deque to consume an iterator.

Timing:

In [1]: import collections  In [2]: x = range(1000)  In [3]: %%timeit    ...: i = iter(x)    ...: for _ in i:    ...:     pass    ...:  16.5 µs ± 829 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  In [4]: %%timeit    ...: i = iter(x)    ...: collections.deque(i, maxlen=0)    ...:  12 µs ± 566 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Of course, this is all based on CPython. The entire nature of interpreter overhead is very different on other Python implementations, and the maxlen=0 fast path is specific to CPython. See abarnert's answer for other Python implementations.

165

answered Oct 16 '22 01:10

user2357112 supports Monica

Related questions
                            
                                Why can't unique_ptr's template arguments be deduced?
                            
                                How do I configure goland to recognize 'mod' packages?
                            
                                Terraform outputs for resources with count
                            
                                How to log request/response using java.net.http.HttpClient?
                            
                                Are ValueTuples suitable as dictionary keys?
                            
                                Using hyphen/dash in python repository name and package name
                            
                                Why does "decimal.TryParse()" always return 0 for the input string "-1" in the below code?
                            
                                Python Unit test module throws "ModuleNotFoundError: No module named 'tests.test_file'"
                            
                                Could not import package. Warning SQL72012: The object exists in the target
                            
                                Why buttons don't work when embedded in a SwiftUI Form?
                            
                                Can't use `-Z macro-backtrace` unstable option with `cargo`
                            
                                Does GCC support C++20 std::format?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With