I have a generator that is roughly as follows:
def gen1():
for x, y in enumerate(xrange(20)):
a = 5*x
b = 10*y
yield a, b
From this generator, I would like to create 2 separate generators as follows:
for a in gen1_split_a():
yield a
for b in gen1_split_b():
yield b
What's my play, SA?
Iterators and generators can't normally be sliced, because no information is known about their length (and they don't implement indexing). The result of islice() is an iterator that produces the desired slice items, but it does this by consuming and discarding all of the items up to the starting slice index.
You can carry out the unpacking procedure for all kinds of iterables like lists, tuples, strings, iterators and generators.
Yes, generator can be used only once.
Definition and Usage. The next() function returns the next item in an iterator. You can add a default return value, to return if the iterable has reached to its end.
I have a solution that might not exactly be what you want. It separates a n
-tuple generator into a tuple of n
individual generators. It requires, however, that each individual value of the current tuple has been returned to proceed to the next tuple. Strictly speaking, it "splits" a n
-tuple generator into n
generators but your example won't work as presented.
It exploits Python's ability to send values back into a generator to influence future yields. The same idea should also be implementable with classes instead but I wanted to get to grips with generators anyway.
When the new generators are initialized, they only know the current n
-tuple. Every time they yield the value at their respective index, a callback is performed that informs a higher level generator of this index. Once all indices of the current tuple have been yielded, the higher level generator moves on to the next tuple and the process repeats.
It may be a bit unwieldy, but here is the code (Python 3.6).
from typing import TypeVar, Generator, Tuple, Iterator, Optional
TYPE_A = TypeVar("TYPE_A")
def _next_value(source: Iterator[Tuple[TYPE_A, ...]], size: int) -> Generator[Tuple[TYPE_A, ...], Optional[int], None]:
checked = [False for _ in range(size)]
value = next(source)
while True:
index = yield value
if all(checked):
value = next(source)
for _i in range(len(checked)):
checked[_i] = False
checked[index] = True
def _sub_iterator(index: int, callback: Generator[Tuple[TYPE_A, ...], int, None]) -> Generator[TYPE_A, None, None]:
while True:
value = callback.send(index)
yield value[index]
def split_iterator(source: Iterator[Tuple[TYPE_A, ...]], size: int) -> Tuple[Generator[TYPE_A, Optional[TYPE_A], None], ...]:
generators = []
_cb = _next_value(source, size)
_cb.send(None)
for _i in range(size):
each_generator = _sub_iterator(_i, _cb)
generators.append(each_generator)
return tuple(generators)
if __name__ == "__main__":
def triple():
_i = 0
while True:
yield tuple(range(_i, _i + 3))
_i += 1
g = triple()
for i, each_value in enumerate(g):
if i >= 5:
break
print(each_value)
print()
g = triple()
a_gen, b_gen, c_gen = split_iterator(g, 3)
for i, (a_value, b_value, c_value) in enumerate(zip(a_gen, b_gen, c_gen)):
if i >= 5:
break
print((a_value, b_value, c_value))
triple()
is a 3-tuple generator and split_iterator()
produces three generators, each of which yields one index from the tuples yielded by triple()
. Each individual _sub_iterator
progresses only once all values from the current tuple have been yielded.
You can't, not without ending up holding all generator output just to be able to produce b
values in the second loop. That can get costly in terms of memory.
You'd use itertools.tee()
to 'duplicate' the generator:
from itertools import tee
def split_gen(gen):
gen_a, gen_b = tee(gen, 2)
return (a for a, b in gen_a), (b for a, b in gen_b)
gen1_split_a, gen1_split_b = split_gen(gen1)
for a in gen1_split_a:
print a
for b in gen1_split_b:
print b
but what happens in this case is that the tee
object will end up having to store everything gen1
produces. From the documentation:
This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use
list()
instead oftee()
.
Following that advice, just put the b
values into a list for the second loop:
b_values = []
for a, b in gen1():
print a
b_values.append(a)
for b in b_values:
print b
or better yet, just process both a
and b
in the one loop.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With