I read some question and answers about differences between iterators and generators. But I don't understand when you should choose one over other? Do you know any examples (simple, real life ones) when one is better than the other? Thank you.
Iterators provide efficient ways of iterating over an existing data structure.
Generators provide efficient ways of generating elements of a sequence on the fly.
Python's file readers can be used as iterators. So what you might use to process one line of a file:
with open('file.txt', 'rb') as fh:
lines = fh.readlines() # this reads the entire file into lines, now
for line in lines:
process(line) # something to be done on each line
You can implement more efficiently using iterators
with open('file.txt', 'rb') as fh:
for line in fh: # this will only read as much as needed, each time
process(line)
The advantage is in the fact that in the second example, you're not reading the entire file into memory, then iterating over a list of lines. Instead, the reader (BufferedReader in Python3) is reading a line at a time, every time you ask for one.
Generators generate elements of a sequence on the fly. Consider the following:
def fib():
idx = 0
vals = [0,1]
while True:
# If we need to compute a new value, do so on the fly
if len(vals) <= idx: vals.append(vals[-1] + vals[-2])
yield vals[idx]
idx += 1
This is an example of a generator. In this case, every time it's "called" it produces the next number in the Fibonacci sequence.
I put "called" in scare quotes because the method of getting successive values from generators is different than a traditional function.
We have two main ways to get values from generators:
Iterating over it
# Print the fibonacci sequence until some event occurs
for f in fib():
print(f)
if f > 100: break
Here we use the in syntax to iterate over the generator, and print the values that are returned, until we get a value that's greater than 100.
Output:
0 1 1 2 3 5 8 13 21 34 55 89 144
Calling next()
We could also call next on the generator (since generators are iterators) and (generate and) access the values that way:
f = fib()
print(next(f)) # 0
print(next(f)) # 1
print(next(f)) # 1
print(next(f)) # 2
print(next(f)) # 3
There are more persuasive examples of generators however. And these often come in the form of "generator expressions", a related concept (PEP-289).
Consider something like the following:
first = any((expensive_thing(i) for i in range(100)))
Here, we're creating a generator expression:
(expensive_thing(i) for i in range(100))
And passing it to the any built-in function. any will return True as soon as an element of the iterable is determined to be True. So when you pass a generator function to any, it will only call expensive_thing(i) as many times as necessary to find a True-ish value.
Compare this with using a list comprehension passed to any:
first = any([expensive_thing(i) for i in range(100)])
In this case, expensive_thing(i) will be called for all values of i, first, then the 100-element list of True/False values will be given to any which will return True if it finds a True-ish value.
But if expensive_thing(0) returned True, clearly the better approach would only be to evaluate that, test it, and stop there. Generators allow you to do this, whereas something like a list comprehension do not.
Consider the following example, illustrating the advantage of using a generator expression over list comprehension:
import time
def expensive_thing(n):
time.sleep(0.1)
return 10 < n < 20
# Find first True value, by using a generator expression
t0 = time.time()
print( any((expensive_thing(i) for i in range(100))) )
t1 = time.time()
td1 = t1-t0
# Find first True value, by using a list comprehension
t0 = time.time()
print( any([expensive_thing(i) for i in range(100)]) )
t1 = time.time()
td2 = t1-t0
print("TD 1:", td1) # TD 1: 1.213068962097168
print("TD 2:", td2) # TD 2: 10.000572204589844
The function expensive_thing introduces an artificial delay to illustrate the difference between the two approaches. The second (list comprehension) approach takes significantly longer, because expensive_thing is evaluated at all 100 indices, whereas the first only calls expensive_thing until it finds a True values (i=11).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With