Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When do you use iterators and when generators? [closed]

I read some question and answers about differences between iterators and generators. But I don't understand when you should choose one over other? Do you know any examples (simple, real life ones) when one is better than the other? Thank you.

like image 531
clappersturdy Avatar asked May 21 '26 09:05

clappersturdy


1 Answers

Iterators provide efficient ways of iterating over an existing data structure.

Generators provide efficient ways of generating elements of a sequence on the fly.

Iterator Example

Python's file readers can be used as iterators. So what you might use to process one line of a file:

with open('file.txt', 'rb') as fh:
    lines = fh.readlines()  # this reads the entire file into lines, now
    for line in lines:
        process(line)       # something to be done on each line

You can implement more efficiently using iterators

with open('file.txt', 'rb') as fh:
    for line in fh:         # this will only read as much as needed, each time
        process(line)

The advantage is in the fact that in the second example, you're not reading the entire file into memory, then iterating over a list of lines. Instead, the reader (BufferedReader in Python3) is reading a line at a time, every time you ask for one.

Generator Example

Generators generate elements of a sequence on the fly. Consider the following:

def fib():
    idx  = 0
    vals = [0,1]
    while True:
        # If we need to compute a new value, do so on the fly
        if len(vals) <= idx: vals.append(vals[-1] + vals[-2])
        yield vals[idx]
        idx += 1

This is an example of a generator. In this case, every time it's "called" it produces the next number in the Fibonacci sequence.

I put "called" in scare quotes because the method of getting successive values from generators is different than a traditional function.

We have two main ways to get values from generators:

Iterating over it

# Print the fibonacci sequence until some event occurs
for f in fib():
    print(f)
    if f > 100: break

Here we use the in syntax to iterate over the generator, and print the values that are returned, until we get a value that's greater than 100.

Output:

0
1
1
2
3
5
8
13
21
34
55
89
144

Calling next()

We could also call next on the generator (since generators are iterators) and (generate and) access the values that way:

f = fib()

print(next(f))  # 0
print(next(f))  # 1
print(next(f))  # 1
print(next(f))  # 2
print(next(f))  # 3

There are more persuasive examples of generators however. And these often come in the form of "generator expressions", a related concept (PEP-289).

Consider something like the following:

first = any((expensive_thing(i) for i in range(100)))

Here, we're creating a generator expression:

(expensive_thing(i) for i in range(100))

And passing it to the any built-in function. any will return True as soon as an element of the iterable is determined to be True. So when you pass a generator function to any, it will only call expensive_thing(i) as many times as necessary to find a True-ish value.

Compare this with using a list comprehension passed to any:

first = any([expensive_thing(i) for i in range(100)])

In this case, expensive_thing(i) will be called for all values of i, first, then the 100-element list of True/False values will be given to any which will return True if it finds a True-ish value.

But if expensive_thing(0) returned True, clearly the better approach would only be to evaluate that, test it, and stop there. Generators allow you to do this, whereas something like a list comprehension do not.


Consider the following example, illustrating the advantage of using a generator expression over list comprehension:

import time

def expensive_thing(n):
    time.sleep(0.1)
    return 10 < n < 20

# Find first True value, by using a generator expression
t0 = time.time()
print( any((expensive_thing(i) for i in range(100))) )
t1 = time.time()
td1 = t1-t0

# Find first True value, by using a list comprehension
t0 = time.time()
print( any([expensive_thing(i) for i in range(100)]) )
t1 = time.time()
td2 = t1-t0

print("TD 1:", td1)  # TD 1:  1.213068962097168
print("TD 2:", td2)  # TD 2: 10.000572204589844

The function expensive_thing introduces an artificial delay to illustrate the difference between the two approaches. The second (list comprehension) approach takes significantly longer, because expensive_thing is evaluated at all 100 indices, whereas the first only calls expensive_thing until it finds a True values (i=11).

like image 137
jedwards Avatar answered May 22 '26 23:05

jedwards



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!