Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elegant mechanism to clean up a generator as it goes out of scope?

I am using several generators inside of a heap queue to iterate through sorted files on disk. Often times the heapq does not completely drain before going out of scope, so the underlying generators will never reach a StopIteration condition.

I'd like to be able to attach a handler to the generator or some other elegant mechanism to delete the files on disk when the generator goes out of scope. The files themselves are temporary so it's fine to delete them. However if they're not deleted the program will ultimately fill up the disk with temporary files. Below is the generator for reference:

def _read_score_index_from_disk(file_name, buffer_size=8*10000):
    """Generator to yield a float/int value from a file, does buffering
    and file managment to avoid keeping file open while function is not
    invoked"""

    file_buffer = ''
    file_offset = 0
    buffer_offset = 1

    while True:
        if buffer_offset > len(file_buffer):
            data_file = open(file_name, 'rb')
            data_file.seek(file_offset)
            file_buffer = data_file.read(buffer_size)
            data_file.close()
            file_offset += buffer_size
            buffer_offset = 0
        packed_score = file_buffer[buffer_offset:buffer_offset+8]
        buffer_offset += 8
        if not packed_score:
            break
        yield struct.unpack('fi', packed_score)

I'm aware of the atexit handler, but it doens't work in my case since this code is to be used in a long running process.

like image 933
Rich Avatar asked Feb 09 '23 04:02

Rich


2 Answers

When generators go out of scope and is deleted, their generator.close() method is called, which in turn raises a GeneratorExit exception in your generator function.

Simply handle that exception:

def _read_score_index_from_disk(file_name, buffer_size=8*10000):
    # ...

    try:
        # generator loop
    except GeneratorExit:
        # clean up after the generator

If you use finally: rather than except GeneratorExit: then the block applies for any exception raised without catching those and when the generator naturally ended (as you don't have to handle the `GeneratorExit‘).

like image 112
Martijn Pieters Avatar answered Feb 11 '23 23:02

Martijn Pieters


You could create a context manager out of a function to handle any clean-up tasks.

Here's a simple example of what I mean:

from contextlib import contextmanager

def my_generator():
    for i in range(10):
        if i > 5:
            break
        yield i

@contextmanager
def generator_context():
    yield my_generator()
    print("cleaning up")

with generator_context() as generator:
    for value in generator:
        print(value)

Output:

0
1
2
3
4
5
cleaning up
like image 39
martineau Avatar answered Feb 12 '23 01:02

martineau