The recipe section of itertools
docs begins with this text:
The extended tools offer the same high performance as the underlying toolset. The superior memory performance is kept by processing elements one at a time rather than bringing the whole iterable into memory all at once. Code volume is kept small by linking the tools together in a functional style which helps eliminate temporary variables. High speed is retained by preferring “vectorized” building blocks over the use of for-loops and generators which incur interpreter overhead.
The question is, how should generators be constructed to avoid the overhead? Could you provide some examples of poor-constructed blocks which have this overhead?
I decided to ask when I was answering this question where I couldn't say exactly if chain(sequence, [obj])
has overhead over chain(sequence, repeat(obj,1))
, and should I prefer the latter.
The doc text is not about how to construct generators to avoid the overhead. It explains that correctly written itertools
-using code, such as code provided in the examples, eschews for-loops and generators altogether, leaving it to itertools or built-in collectors (such as list
) to consume the iterators.
Take, for example, the tabulate
example:
def tabulate(function, start=0):
"Return function(0), function(1), ..."
return imap(function, count(start))
The non-vectorized way to write this would have been:
def tabulate(function, start=0):
i = start
while True:
yield function(i)
i += 1
This version "incurs interpreter overhead" because the looping and the function call is done in Python.
Regarding chaining a single element, it is safe to assume that chain(sequence, [obj])
will be (trivially) faster because fixed-length list construction is well-optimized in Python using specialized syntax and op-codes. In the same vein, chain(sequence, (obj,))
would be even faster, since tuples share the optimizations for lists, and are smaller to boot. As always with benchmarking, it is much better to measure with python -m timeit
than to guess.
The documentation quote does not concern itself with differences in iterator creation, such as those exhibited by choosing one of repeat(foo, 1)
and [foo]
. Since the iterator only ever produces a single element, it makes no difference how it is consumed. The docs speak of processor and memory efficiency when dealing with iterators that can produce millions of elements. Compared to that, choosing the iterator that creates faster is trivial because creation strategy can be changed at any time. On the other hand, once the code is designed to use explicit loops that don't vectorize, it can be very hard to change later without a full rewrite.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With