Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I ensure that a generator gets properly closed?

Consider a library function with the following signature:

from typing import Iterator

def get_numbers() -> Iterator[int]:
    ...

Let's look at some simple code that consumes it:

for i in get_numbers():
    print(i)

Nothing interesting so far. But let's say we don't care for even numbers. Only numbers that are odd, like us:

for i in get_numbers():
    if i & 1 == 0:
        raise ValueError("Ew, an even number!")
    print(i)

Now let's try an implementation of get_numbers:

def get_numbers() -> Iterator[int]:
    yield 1
    yield 2
    yield 3

Nothing very interesting here. The results of running our little for are pretty much what we'd expect:

>>> for i in get_numbers():
  2     if i & 1 == 0:
  3         raise ValueError("Ew, an even number!")
  4     print(i)
1
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
ValueError: Ew, an even number!

Ew, an even number!
>>>

We'd get the exact same results if get_numbers had a simpler implementation:

def get_numbers() -> Iterator[int]:
    return iter([1, 2, 3])

But let's instead suppose that get_numbers needs to remain a generator because it manages some resource.

def get_numbers() -> Iterator[int]:
    acquire_some_resource()
    try:
        yield 1
        yield 2
        yield 3
    finally:
        release_some_resource()

For our purposes, the resource we'll manage will just be text printed on the screen:

def acquire_some_resource() -> None:
    print("generating some numbers")

def release_some_resource() -> None:
    print("done generating numbers")

Our output is still predictable:

>>> for i in get_numbers():
  2     if i & 1 == 0:
  3         raise ValueError("Ew, an even number!")
  4     print(i)
generating some numbers
1
done generating numbers
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
ValueError: Ew, an even number!

Ew, an even number!
>>>

But what if we can't use a simple for loop? What if we want to ignore the first number, for example? (Let's pretend that itertools.islice isn't a thing.)

>>> it = get_numbers()
  2 next(it, None)
  3 for i in it:
  4     if i & 1 == 0:
  5         raise ValueError("Ew, an even number!")
  6     print(i)
generating some numbers
Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
ValueError: Ew, an even number!

Ew, an even number!
>>>

Notice something? We acquired our resource, as evidenced by the text "generating some numbers", but we never released it.

The right thing to do is to make sure the generator gets closed:

>>> it = get_numbers()
  2 try:
  3     next(it, None)
  4     for i in it:
  5         if i & 1 == 0:
  6             raise ValueError("Ew, an even number!")
  7         print(i)
  8 finally:
  9     it.close()
generating some numbers
done generating numbers
Traceback (most recent call last):
  File "<stdin>", line 6, in <module>
ValueError: Ew, an even number!

Ew, an even number!
>>>

The problem with this approach is that this assumes that get_numbers() returns a generator, and thus has a close method. But its signature doesn't promise that. What if its implementation is the simpler one I gave earlier?

>>> def get_numbers() -> Iterator[int]:
  2     return iter([1, 2, 3])
  3 
  4 it = get_numbers()
  5 try:
  6     next(it, None)
  7     for i in it:
  8         if i & 1 == 0:
  9             raise ValueError("Ew, an even number!")
 10         print(i)
 11 finally:
 12     it.close()
Traceback (most recent call last):
  File "<stdin>", line 12, in <module>
AttributeError: 'list_iterator' object has no attribute 'close'

'list_iterator' object has no attribute 'close'
>>>

So the right thing to do here is something pretty tedious:

it = get_numbers()
try:
    next(it, None)
    for i in it: 
        if i & 1 == 0: 
            raise ValueError("Ew, an even number!") 
        print(i) 
finally: 
    if hasattr(it, "close"): 
        it.close()

I can wrap this up in a context manager to make it simpler, but it feels like I'm doing something the language should be doing for me, or at minimum, that the callee should be concerning itself with, not the caller.

Is there a simpler way to handle this?

like image 710
P Daddy Avatar asked Nov 06 '19 19:11

P Daddy


1 Answers

As my comment mentioned, one way to properly structure this would be using the contextlib.contextmanager to decorate your generator:

from typing import Iterator
import contextlib

@contextlib.contextmanager
def get_numbers() -> Iterator[int]:
    acquire_some_resource()
    try:
        yield iter([1, 2, 3])
    finally:
        release_some_resource()

Then when you use the generator:

with get_numbers() as et:
    for i in et:
        if i % 2 == 0:
            raise ValueError()
        else:
            print(i)

Result:

generating some numbers
1
done generating numbers
Traceback (most recent call last):
  File "<pyshell#64>", line 4, in <module>
    raise ValueError()
ValueError

This allows the contextmanager decorator to manage your resources for you without worrying handling the release. If you're feeling courageous, you might even build your own context manager class with __enter__ and __exit__ function to handle your resource.

I think the key takeaway here is that since your generator is expected to manage a resource, you should either be using the with statement or always be closing it afterwards, much like f = open(...) should always follow with a f.close()

like image 182
r.ook Avatar answered Nov 02 '22 01:11

r.ook