Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: yield-and-delete

Tags:

python

How do I yield an object from a generator and forget it immediately, so that it doesn't take up memory?

For example, in the following function:

def grouper(iterable, chunksize):
    """
    Return elements from the iterable in `chunksize`-ed lists. The last returned
    element may be smaller (if length of collection is not divisible by `chunksize`).

    >>> print list(grouper(xrange(10), 3))
    [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
    """
    i = iter(iterable)
    while True:
        chunk = list(itertools.islice(i, int(chunksize)))
        if not chunk:
            break
        yield chunk

I don't want the function to hold on to the reference to chunk after yielding it, as it is not used further and just consumes memory, even if all outside references are gone.


EDIT: using standard Python 2.5/2.6/2.7 from python.org.


Solution (proposed almost simultaneously by @phihag and @Owen): wrap the result in a (small) mutable object and return the chunk anonymously, leaving only the small container behind:

def chunker(iterable, chunksize):
    """
    Return elements from the iterable in `chunksize`-ed lists. The last returned
    chunk may be smaller (if length of collection is not divisible by `chunksize`).

    >>> print list(chunker(xrange(10), 3))
    [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
    """
    i = iter(iterable)
    while True:
        wrapped_chunk = [list(itertools.islice(i, int(chunksize)))]
        if not wrapped_chunk[0]:
            break
        yield wrapped_chunk.pop()

With this memory optimization, you can now do something like:

 for big_chunk in chunker(some_generator, chunksize=10000):
     ... process big_chunk
     del big_chunk # big_chunk ready to be garbage-collected :-)
     ... do more stuff
like image 326
Radim Avatar asked Aug 20 '11 16:08

Radim


People also ask

Does yield stop execution Python?

Inside a program, when you call a function that has a yield statement, as soon as a yield is encountered, the execution of the function stops and returns an object of the generator to the function caller.

What happens after yield Python?

yield in Python can be used like the return statement in a function. When done so, the function instead of returning the output, it returns a generator that can be iterated upon. You can then iterate through the generator to extract items. Iterating is done using a for loop or simply using the next() function.

Should I use yield in Python?

We should use yield when we want to iterate over a sequence, but don't want to store the entire sequence in memory. Yield are used in Python generators. A generator function is defined like a normal function, but whenever it needs to generate a value, it does so with the yield keyword rather than return.

How do you return and yield in Python?

The yield statement hauls the function and returns back the value to the function caller and restart from where it is left off. The yield statement can be called multiple times. While the return statement ends the execution of the function and returns the value back to the caller.


2 Answers

After yield chunk, the variable value is never used again in the function, so a good interpreter/garbage collector will already free chunk for garbage collection (note: cpython 2.7 seems not do this, pypy 1.6 with default gc does). Therefore, you don't have to change anything but your code example, which is missing the second argument to grouper.

Note that garbage collection is non-deterministic in Python. The null garbage collector, which doesn't collect free objects at all, is a perfectly valid garbage collector. From the Python manual:

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.

Therefore, it can not be decided whether a Python program does or "doesn't take up memory" without specifying Python implementation and garbage collector. Given a specific Python implementation and garbage collector, you can use the gc module to test whether the object is freed.

That being said, if you really want no reference from the function (not necessarily meaning the object will be garbage-collected), here's how to do it:

def grouper(iterable, chunksize):
    i = iter(iterable)
    while True:
        tmpr = [list(itertools.islice(i, int(chunksize)))]
        if not tmpr[0]:
            break
        yield tmpr.pop()

Instead of a list, you can also use any other data structure that with a function which removes and returns an object, like Owen's wrapper.

like image 178
phihag Avatar answered Sep 19 '22 12:09

phihag


If you really really want to get this functionality I suppose you could use a wrapper:

class Wrap:

    def __init__(self, val):
        self.val = val

    def unlink(self):
        val = self.val
        self.val = None
        return val

And could be used like

def grouper(iterable, chunksize):
    i = iter(iterable)
    while True:
        chunk = Wrap(list(itertools.islice(i, int(chunksize))))
        if not chunk.val:
            break
        yield chunk.unlink()

Which is essentially the same as what phihag does with pop() ;)

like image 26
Owen Avatar answered Sep 23 '22 12:09

Owen