Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python generator of generators?

I wrote a class that reads a txt file. The file is composed of blocks of non-empty lines (let's call them "sections"), separated by an empty line:

line1.1
line1.2
line1.3

line2.1
line2.2

My first implementation was to read the whole file and return a list of lists, that is a list of sections, where each section is a list of lines. This was obviously terrible memory-wise.

So I re-implemented it as a generator of lists, that is at every cycle my class reads a whole section in memory as a list and yields it.

This is better, but it's still problematic in case of large sections. So I wonder if I can reimplement it as a generator of generators? The problem is that this class is very generic, and it should be able to satisfy both of these use cases:

  1. read a very big file, containing very big sections, and cycle through it only once. A generator of generators is perfect for this.
  2. read a smallish file into memory to be cycled over multiple times. A generator of lists works fine, because the user can just invoke

    list(MyClass(file_handle))

However, a generator of generators would NOT work in case 2, as the inner objects would not be transformed to lists.

Is there anything more elegant than implementing an explicit to_list() method, that would transform the generator of generators into a list of lists?

like image 561
crusaderky Avatar asked Sep 26 '13 16:09

crusaderky


People also ask

What are the generators in Python?

Python generators are a simple way of creating iterators. All the work we mentioned above are automatically handled by generators in Python. Simply speaking, a generator is a function that returns an object (iterator) which we can iterate over (one value at a time).

How do generators work Python?

A Python generator is a function that produces a sequence of results. It works by maintaining its local state, so that the function can resume again exactly where it left off when called subsequent times. Thus, you can think of a generator as something like a powerful iterator.

Is Python range a generator?

A comprehensive practical guide The generators in Python are one of those tools that we frequently use but do not talk about much. For instance, most for loops are accompanied with the range function which is a generator. Generators allow for generating a sequence of values over time.

Are generators lazy in Python?

Generators are memory efficient since they only require memory for the one value they yield. Generators are lazy: they only yield values when explicitly asked.


1 Answers

Python 2:

map(list, generator_of_generators)

Python 3:

list(map(list, generator_of_generators))

or for both:

[list(gen) for gen in generator_of_generators]

Since the generated objects are generator functions, not mere generators, you'd want to do

[list(gen()) for gen in generator_of_generator_functions]

If that doesn't work I have no idea what you're asking. Also, why would it return a generator function and not a generator itself?


Since in the comments you said you wanted to avoid list(generator_of_generator_functions) from crashing mysteriously, this depends on what you really want.

  • It is not possible to overwrite the behaviour of list in this way: either you store the sub-generator elements or not

  • If you really do get a crash, I recommend exhausting the sub-generator with the main generator loop every time the main generator iterates. This is standard practice and exactly what itertools.groupby does, a stdlib generator-of-generators.

eg.

def metagen():
    def innergen():
        yield 1
        yield 2
        yield 3

    for i in range(3):
        r = innergen()
        yield r

        for _ in r: pass
  • Or use a dark, secret hack method that I'll show in a mo' (I need to write it), but don't do it!

As promised, the hack (for Python 3, this time 'round):

from collections import UserList
from functools import partial


def objectitemcaller(key):
    def inner(*args, **kwargs):
        try:
            return getattr(object, key)(*args, **kwargs)
        except AttributeError:
            return NotImplemented
    return inner


class Listable(UserList):
    def __init__(self, iterator):
        self.iterator = iterator
        self.iterated = False

    def __iter__(self):
        return self

    def __next__(self):
        self.iterated = True
        return next(self.iterator)

    def _to_list_hack(self):
        self.data = list(self)
        del self.iterated
        del self.iterator
        self.__class__ = UserList

for key in UserList.__dict__.keys() - Listable.__dict__.keys():
    if key not in ["__class__", "__dict__", "__module__", "__subclasshook__"]:
        setattr(Listable, key, objectitemcaller(key))


def metagen():
    def innergen():
        yield 1
        yield 2
        yield 3

    for i in range(3):
        r = Listable(innergen())
        yield r

        if not r.iterated:
            r._to_list_hack()

        else:
            for item in r: pass

for item in metagen():
    print(item)
    print(list(item))
#>>> <Listable object at 0x7f46e4a4b850>
#>>> [1, 2, 3]
#>>> <Listable object at 0x7f46e4a4b950>
#>>> [1, 2, 3]
#>>> <Listable object at 0x7f46e4a4b990>
#>>> [1, 2, 3]

list(metagen())
#>>> [[1, 2, 3], [1, 2, 3], [1, 2, 3]]

It's so bad I don't want to even explain it.

The key is that you have a wrapper that can detect whether it has been iterated, and if not you run a _to_list_hack that, I kid you not, changes the __class__ attribute.

Because of conflicting layouts we have to use the UserList class and shadow all of its methods, which is just another layer of crud.

Basically, please don't use this hack. You can enjoy it as humour, though.

like image 163
Veedrac Avatar answered Nov 23 '22 15:11

Veedrac