Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Turning a generator of pairs into a pair of generators

How would I turn a generator of pairs (tuples):

tuple_gen = (i for i in [(1, "a"), (2, "b"), (3, "c")])

Into two generators which would yield [1, 2, 3] and ["a", "b", "c"]?

I need to process separately the first and second elements of the tuples and the processing functions expect an iterable.

The generator is very large (millions of items) so I'd like to avoid having all items in memory at the same time unless there is no other solution.

like image 350
asachet Avatar asked Nov 06 '17 22:11

asachet


People also ask

What is yield from in python?

Yield is a keyword in Python that is used to return from a function without destroying the states of its local variable and when the function is called, the execution starts from the last yield statement. Any function that contains a yield keyword is termed a generator.

What is a generator expression Python?

A generator expression is an expression that returns a generator object. Basically, a generator function is a function that contains a yield statement and returns a generator object.


Video Answer


2 Answers

You can create n distinct iterators using the tee function from the itertools package. You would then iterate over them separately:

from itertools impor tee

i1, i2 = tee(tuple_gen, n=2)
firsts = (x[0] for x in i1)
seconds = (x[1] for x in i2)
like image 102
bow Avatar answered Nov 14 '22 22:11

bow


There's a fundamental problem here. Say you get your two iterators iter1 and iter2, and you pass iter1 to a function that eats the whole thing:

def consume(iterable):
    for thing in iterable:
        do_stuff_with(thing)

consume(iter1)

That's going to need to iterate through all of tuple_gen to get the first items, and then what do you do with the second items? Unless you're okay with rerunning the generator to get the second items again, you need to store all of them, in memory unless you can persist them to disk or something, so you're not much better off than if you'd just dumped tuple_gen into a list.


If you do this, you have to consume the iterators in parallel, or run the underlying generator twice, or spend a lot of memory saving the tuple elements you're not processing so the other iterator can go over them. Unfortunately, consuming the iterators in parallel will require either rewriting the consumer functions or running them in separate threads. Running the generator twice is simplest if you can do it, but not always an option.

like image 43
user2357112 supports Monica Avatar answered Nov 14 '22 22:11

user2357112 supports Monica