Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to efficiently exhaust an iterator in a oneliner?

Tags:

python

If i have an iterator it and want to exhaust it I can write:

for x in it:
    pass

Is there a builtin or standard library call which allows me to do it in a one-liner? Of course i could do:

list(it)

which will build a list from the iterator and then discard it. But i consider that inefficient because of the list-building step. It's of course trivial to write myself a helper function that does the empty for loop but i am curious if there is something else i am missing.

like image 260
hpk42 Avatar asked Apr 21 '16 07:04

hpk42


People also ask

Are iterators faster Python?

Iterators will be faster and have better memory efficiency.

Can an iterator go backwards?

C++ Iterators Reverse IteratorsTo iterate backwards use rbegin() and rend() as the iterators for the end of the collection, and the start of the collection respectively.

What is an iterator give two examples of iterators?

In Python, an iterator is an object which implements the iterator protocol, which means it consists of the methods such as __iter__() and __next__(). An iterator is an iterable object with a state so it remembers where it is during iteration. For Example, Generator.


3 Answers

From the itertools recipes:

    # feed the entire iterator into a zero-length deque
    collections.deque(iterator, maxlen=0)
like image 176
Ignacio Vazquez-Abrams Avatar answered Oct 06 '22 10:10

Ignacio Vazquez-Abrams


2022 update (bounty asks): There's no "dedicated function" for it in the standard library, and deque(it, 0) is still the most efficient. That's why it's used in itertools's consume recipe and more-itertools's consume function (click on [source] there).

Benchmark of the various proposals, iteration time per element, iterating itertools.repeat(None, 10**5) (with CPython 3.10):

  2.7 ns ± 0.1 ns consume_deque
  6.5 ns ± 0.0 ns consume_loop
  6.5 ns ± 0.0 ns consume_all_if_False
 13.9 ns ± 0.3 ns consume_object_in
 27.0 ns ± 0.1 ns consume_all_True
 29.4 ns ± 0.3 ns consume_sum_0
 44.8 ns ± 0.1 ns consume_reduce

The deque one wins due to being C and having a fast path for maxlen == 0 which does nothing with the elements.

The simple loop gets second place, fastest with Python iteration. The other solutions previously proposed here waste more or less time by doing more or less work with each element. I added consume_all_if_False to show how to do an all/sum efficiently: have an if False clause so your generator doesn't produce anything.

Benchmark code (Try it online!):

def consume_loop(it):
    for _ in it:
        pass

def consume_deque(it):
    deque(it, 0)

def consume_object_in(it):
    object() in it

def consume_all_True(it):
    all(True for _ in it)

def consume_all_if_False(it):
    all(_ for _ in it if False)

def consume_sum_0(it):
    sum(0 for _ in it)

def consume_reduce(it):
    reduce(lambda x, y: y, it)

funcs = [
    consume_loop,
    consume_deque,
    consume_object_in,
    consume_all_True,
    consume_all_if_False,
    consume_sum_0,
    consume_reduce,
]

from timeit import default_timer as timer
from itertools import repeat
from collections import deque
from functools import reduce
from random import shuffle
from statistics import mean, stdev

times = {f: [] for f in funcs}
def stats(f):
    ts = [t * 1e9 for t in sorted(times[f])[:5]]
    return f'{mean(ts):5.1f} ns ± {stdev(ts):3.1f} ns'

for _ in range(25):
  shuffle(funcs)
  for f in funcs:
    n = 10**5
    it = repeat(None, n)
    t0 = timer()
    f(it)
    t1 = timer()
    times[f].append((t1 - t0) / n)

for f in sorted(funcs, key=stats):
  print(stats(f), f.__name__)
like image 5
Kelly Bundy Avatar answered Oct 06 '22 12:10

Kelly Bundy


Note that your suggestion can also be formulated as a one-liner:

for _ in it: pass

And I just made:

def exhaust(it):
    for _ in it:
        pass

It's not as fast as the deque solution (10% slower on my laptop), but I find it cleaner.

like image 3
Yuval Avatar answered Oct 06 '22 11:10

Yuval