If i have an iterator <code>it</code> and want to exhaust it I can write: <pre class="prettyprint"><code>for x in it: pass </code></pre> Is there a builtin or standard library call which allows me to do it in a one-liner? Of course i could do: <pre class="prettyprint"><code>list(it) </code></pre> which will build a list from the iterator and then discard it. But i consider that inefficient because of the list-building step. It's of course trivial to write myself a helper function that does the empty for loop but i am curious if there is something else i am missing.

From the <code>itertools</code> recipes: <pre class="prettyprint"><code> # feed the entire iterator into a zero-length deque collections.deque(iterator, maxlen=0) </code></pre>

2022 update (bounty asks): There's no "dedicated function" for it in the standard library, and <code>deque(it, 0)</code> is still the most efficient. That's why it's used in itertools's <code>consume</code> recipe and more-itertools's <code>consume</code> function (click on [source] there). Benchmark of the various proposals, iteration time per element, iterating <code>itertools.repeat(None, 10**5)</code> (with CPython 3.10): <pre class="prettyprint"><code> 2.7 ns ± 0.1 ns consume_deque 6.5 ns ± 0.0 ns consume_loop 6.5 ns ± 0.0 ns consume_all_if_False 13.9 ns ± 0.3 ns consume_object_in 27.0 ns ± 0.1 ns consume_all_True 29.4 ns ± 0.3 ns consume_sum_0 44.8 ns ± 0.1 ns consume_reduce </code></pre> The <code>deque</code> one wins due to being C and having a fast path for <code>maxlen == 0</code> which does nothing with the elements. The simple loop gets second place, fastest with Python iteration. The other solutions previously proposed here waste more or less time by doing more or less work with each element. I added <code>consume_all_if_False</code> to show how to do an <code>all</code>/<code>sum</code> efficiently: have an <code>if False</code> clause so your generator doesn't produce anything. Benchmark code (Try it online!): <pre class="prettyprint"><code>def consume_loop(it): for _ in it: pass def consume_deque(it): deque(it, 0) def consume_object_in(it): object() in it def consume_all_True(it): all(True for _ in it) def consume_all_if_False(it): all(_ for _ in it if False) def consume_sum_0(it): sum(0 for _ in it) def consume_reduce(it): reduce(lambda x, y: y, it) funcs = [ consume_loop, consume_deque, consume_object_in, consume_all_True, consume_all_if_False, consume_sum_0, consume_reduce, ] from timeit import default_timer as timer from itertools import repeat from collections import deque from functools import reduce from random import shuffle from statistics import mean, stdev times = {f: [] for f in funcs} def stats(f): ts = [t * 1e9 for t in sorted(times[f])[:5]] return f'{mean(ts):5.1f} ns ± {stdev(ts):3.1f} ns' for _ in range(25): shuffle(funcs) for f in funcs: n = 10**5 it = repeat(None, n) t0 = timer() f(it) t1 = timer() times[f].append((t1 - t0) / n) for f in sorted(funcs, key=stats): print(stats(f), f.__name__) </code></pre>

Note that your suggestion can also be formulated as a one-liner: <pre class="prettyprint"><code>for _ in it: pass </code></pre> And I just made: <pre class="prettyprint"><code>def exhaust(it): for _ in it: pass </code></pre> It's not as fast as the <code>deque</code> solution (10% slower on my laptop), but I find it cleaner.

how to efficiently exhaust an iterator in a oneliner?

Tags:

python

If i have an iterator it and want to exhaust it I can write:

for x in it:
    pass

Is there a builtin or standard library call which allows me to do it in a one-liner? Of course i could do:

list(it)

which will build a list from the iterator and then discard it. But i consider that inefficient because of the list-building step. It's of course trivial to write myself a helper function that does the empty for loop but i am curious if there is something else i am missing.

260

asked Apr 21 '16 07:04

hpk42

3 Answers

From the itertools recipes:

    # feed the entire iterator into a zero-length deque
    collections.deque(iterator, maxlen=0)

176

answered Oct 06 '22 10:10

Ignacio Vazquez-Abrams

2022 update (bounty asks): There's no "dedicated function" for it in the standard library, and deque(it, 0) is still the most efficient. That's why it's used in itertools's consume recipe and more-itertools's consume function (click on [source] there).

Benchmark of the various proposals, iteration time per element, iterating itertools.repeat(None, 10**5) (with CPython 3.10):

  2.7 ns ± 0.1 ns consume_deque
  6.5 ns ± 0.0 ns consume_loop
  6.5 ns ± 0.0 ns consume_all_if_False
 13.9 ns ± 0.3 ns consume_object_in
 27.0 ns ± 0.1 ns consume_all_True
 29.4 ns ± 0.3 ns consume_sum_0
 44.8 ns ± 0.1 ns consume_reduce

The deque one wins due to being C and having a fast path for maxlen == 0 which does nothing with the elements.

The simple loop gets second place, fastest with Python iteration. The other solutions previously proposed here waste more or less time by doing more or less work with each element. I added consume_all_if_False to show how to do an all/sum efficiently: have an if False clause so your generator doesn't produce anything.

Benchmark code (Try it online!):

def consume_loop(it):
    for _ in it:
        pass

def consume_deque(it):
    deque(it, 0)

def consume_object_in(it):
    object() in it

def consume_all_True(it):
    all(True for _ in it)

def consume_all_if_False(it):
    all(_ for _ in it if False)

def consume_sum_0(it):
    sum(0 for _ in it)

def consume_reduce(it):
    reduce(lambda x, y: y, it)

funcs = [
    consume_loop,
    consume_deque,
    consume_object_in,
    consume_all_True,
    consume_all_if_False,
    consume_sum_0,
    consume_reduce,
]

from timeit import default_timer as timer
from itertools import repeat
from collections import deque
from functools import reduce
from random import shuffle
from statistics import mean, stdev

times = {f: [] for f in funcs}
def stats(f):
    ts = [t * 1e9 for t in sorted(times[f])[:5]]
    return f'{mean(ts):5.1f} ns ± {stdev(ts):3.1f} ns'

for _ in range(25):
  shuffle(funcs)
  for f in funcs:
    n = 10**5
    it = repeat(None, n)
    t0 = timer()
    f(it)
    t1 = timer()
    times[f].append((t1 - t0) / n)

for f in sorted(funcs, key=stats):
  print(stats(f), f.__name__)

answered Oct 06 '22 12:10

Kelly Bundy

Note that your suggestion can also be formulated as a one-liner:

for _ in it: pass

And I just made:

def exhaust(it):
    for _ in it:
        pass

It's not as fast as the deque solution (10% slower on my laptop), but I find it cleaner.

answered Oct 06 '22 11:10

Yuval

Related questions
                            
                                Python3: What is the difference between keywords and builtins?
                            
                                Convert numpy array to PySide QPixmap
                            
                                How do you install Python Xlib with pip?
                            
                                Efficiently construct Pandas DataFrame from large list of tuples/rows
                            
                                How to transfer a file to ssh server in an ssh-connection made by paramiko?
                            
                                Elegant way to test SSH availability
                            
                                Pandas Drop Rows Outside of Time Range
                            
                                S3 Object Expiration using boto
                            
                                How to convert numpy object array into str/unicode array?
                            
                                What is the difference between cholesky in numpy and scipy?
                            
                                Creating percentile buckets in pandas
                            
                                What is the best way to compute the trace of a matrix product in numpy?
                            
                                Can I somehow share an asynchronous queue with a subprocess?
                            
                                is Scrapy single-threaded or multi-threaded?
                            
                                Why does mysql connector break ("Lost connection to MySQL server during query" error)
                            
                                Scikit: calculate precision and recall using cross_val_score function
                            
                                AttributeError using pyBrain _splitWithPortion - object type changed?
                            
                                Python program to rename file names while overwriting if there already is that file
                            
                                Simple SELECT statement on existing table with SQLAlchemy
                            
                                OpenCV ORB detector finds very few keypoints

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With