I wrote a class that reads a txt file. The file is composed of blocks of non-empty lines (let's call them "sections"), separated by an empty line:
line1.1
line1.2
line1.3
line2.1
line2.2
My first implementation was to read the whole file and return a list of lists, that is a list of sections, where each section is a list of lines. This was obviously terrible memory-wise.
So I re-implemented it as a generator of lists, that is at every cycle my class reads a whole section in memory as a list and yields it.
This is better, but it's still problematic in case of large sections. So I wonder if I can reimplement it as a generator of generators? The problem is that this class is very generic, and it should be able to satisfy both of these use cases:
read a smallish file into memory to be cycled over multiple times. A generator of lists works fine, because the user can just invoke
list(MyClass(file_handle))
However, a generator of generators would NOT work in case 2, as the inner objects would not be transformed to lists.
Is there anything more elegant than implementing an explicit to_list() method, that would transform the generator of generators into a list of lists?
Python generators are a simple way of creating iterators. All the work we mentioned above are automatically handled by generators in Python. Simply speaking, a generator is a function that returns an object (iterator) which we can iterate over (one value at a time).
A Python generator is a function that produces a sequence of results. It works by maintaining its local state, so that the function can resume again exactly where it left off when called subsequent times. Thus, you can think of a generator as something like a powerful iterator.
A comprehensive practical guide The generators in Python are one of those tools that we frequently use but do not talk about much. For instance, most for loops are accompanied with the range function which is a generator. Generators allow for generating a sequence of values over time.
Generators are memory efficient since they only require memory for the one value they yield. Generators are lazy: they only yield values when explicitly asked.
Python 2:
map(list, generator_of_generators)
Python 3:
list(map(list, generator_of_generators))
or for both:
[list(gen) for gen in generator_of_generators]
Since the generated objects are generator functions
, not mere generators, you'd want to do
[list(gen()) for gen in generator_of_generator_functions]
If that doesn't work I have no idea what you're asking. Also, why would it return a generator function and not a generator itself?
Since in the comments you said you wanted to avoid list(generator_of_generator_functions)
from crashing mysteriously, this depends on what you really want.
It is not possible to overwrite the behaviour of list
in this way: either you store the sub-generator elements or not
If you really do get a crash, I recommend exhausting the sub-generator with the main generator loop every time the main generator iterates. This is standard practice and exactly what itertools.groupby
does, a stdlib generator-of-generators.
eg.
def metagen():
def innergen():
yield 1
yield 2
yield 3
for i in range(3):
r = innergen()
yield r
for _ in r: pass
As promised, the hack (for Python 3, this time 'round):
from collections import UserList
from functools import partial
def objectitemcaller(key):
def inner(*args, **kwargs):
try:
return getattr(object, key)(*args, **kwargs)
except AttributeError:
return NotImplemented
return inner
class Listable(UserList):
def __init__(self, iterator):
self.iterator = iterator
self.iterated = False
def __iter__(self):
return self
def __next__(self):
self.iterated = True
return next(self.iterator)
def _to_list_hack(self):
self.data = list(self)
del self.iterated
del self.iterator
self.__class__ = UserList
for key in UserList.__dict__.keys() - Listable.__dict__.keys():
if key not in ["__class__", "__dict__", "__module__", "__subclasshook__"]:
setattr(Listable, key, objectitemcaller(key))
def metagen():
def innergen():
yield 1
yield 2
yield 3
for i in range(3):
r = Listable(innergen())
yield r
if not r.iterated:
r._to_list_hack()
else:
for item in r: pass
for item in metagen():
print(item)
print(list(item))
#>>> <Listable object at 0x7f46e4a4b850>
#>>> [1, 2, 3]
#>>> <Listable object at 0x7f46e4a4b950>
#>>> [1, 2, 3]
#>>> <Listable object at 0x7f46e4a4b990>
#>>> [1, 2, 3]
list(metagen())
#>>> [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
It's so bad I don't want to even explain it.
The key is that you have a wrapper that can detect whether it has been iterated, and if not you run a _to_list_hack
that, I kid you not, changes the __class__
attribute.
Because of conflicting layouts we have to use the UserList
class and shadow all of its methods, which is just another layer of crud.
Basically, please don't use this hack. You can enjoy it as humour, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With