How would I turn a generator of pairs (tuples):
tuple_gen = (i for i in [(1, "a"), (2, "b"), (3, "c")])
Into two generators which would yield [1, 2, 3]
and ["a", "b", "c"]
?
I need to process separately the first and second elements of the tuples and the processing functions expect an iterable.
The generator is very large (millions of items) so I'd like to avoid having all items in memory at the same time unless there is no other solution.
Yield is a keyword in Python that is used to return from a function without destroying the states of its local variable and when the function is called, the execution starts from the last yield statement. Any function that contains a yield keyword is termed a generator.
A generator expression is an expression that returns a generator object. Basically, a generator function is a function that contains a yield statement and returns a generator object.
You can create n
distinct iterators using the tee function from the itertools package. You would then iterate over them separately:
from itertools impor tee
i1, i2 = tee(tuple_gen, n=2)
firsts = (x[0] for x in i1)
seconds = (x[1] for x in i2)
There's a fundamental problem here. Say you get your two iterators iter1
and iter2
, and you pass iter1
to a function that eats the whole thing:
def consume(iterable):
for thing in iterable:
do_stuff_with(thing)
consume(iter1)
That's going to need to iterate through all of tuple_gen
to get the first items, and then what do you do with the second items? Unless you're okay with rerunning the generator to get the second items again, you need to store all of them, in memory unless you can persist them to disk or something, so you're not much better off than if you'd just dumped tuple_gen
into a list.
If you do this, you have to consume the iterators in parallel, or run the underlying generator twice, or spend a lot of memory saving the tuple elements you're not processing so the other iterator can go over them. Unfortunately, consuming the iterators in parallel will require either rewriting the consumer functions or running them in separate threads. Running the generator twice is simplest if you can do it, but not always an option.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With