I have a generator that is roughly as follows: <pre class="prettyprint"><code>def gen1(): for x, y in enumerate(xrange(20)): a = 5*x b = 10*y yield a, b </code></pre> From this generator, I would like to create 2 separate generators as follows: <pre class="prettyprint"><code>for a in gen1_split_a(): yield a for b in gen1_split_b(): yield b </code></pre> What's my play, SA?

I have a solution that might not exactly be what you want. It separates a <code>n</code>-tuple generator into a tuple of <code>n</code> individual generators. It requires, however, that each individual value of the current tuple has been returned to proceed to the next tuple. Strictly speaking, it "splits" a <code>n</code>-tuple generator into <code>n</code> generators but your example won't work as presented. It exploits Python's ability to send values back into a generator to influence future yields. The same idea should also be implementable with classes instead but I wanted to get to grips with generators anyway. When the new generators are initialized, they only know the current <code>n</code>-tuple. Every time they yield the value at their respective index, a callback is performed that informs a higher level generator of this index. Once all indices of the current tuple have been yielded, the higher level generator moves on to the next tuple and the process repeats. It may be a bit unwieldy, but here is the code (Python 3.6). <pre class="prettyprint"><code>from typing import TypeVar, Generator, Tuple, Iterator, Optional TYPE_A = TypeVar("TYPE_A") def _next_value(source: Iterator[Tuple[TYPE_A, ...]], size: int) -> Generator[Tuple[TYPE_A, ...], Optional[int], None]: checked = [False for _ in range(size)] value = next(source) while True: index = yield value if all(checked): value = next(source) for _i in range(len(checked)): checked[_i] = False checked[index] = True def _sub_iterator(index: int, callback: Generator[Tuple[TYPE_A, ...], int, None]) -> Generator[TYPE_A, None, None]: while True: value = callback.send(index) yield value[index] def split_iterator(source: Iterator[Tuple[TYPE_A, ...]], size: int) -> Tuple[Generator[TYPE_A, Optional[TYPE_A], None], ...]: generators = [] _cb = _next_value(source, size) _cb.send(None) for _i in range(size): each_generator = _sub_iterator(_i, _cb) generators.append(each_generator) return tuple(generators) if __name__ == "__main__": def triple(): _i = 0 while True: yield tuple(range(_i, _i + 3)) _i += 1 g = triple() for i, each_value in enumerate(g): if i >= 5: break print(each_value) print() g = triple() a_gen, b_gen, c_gen = split_iterator(g, 3) for i, (a_value, b_value, c_value) in enumerate(zip(a_gen, b_gen, c_gen)): if i >= 5: break print((a_value, b_value, c_value)) </code></pre> <code>triple()</code> is a 3-tuple generator and <code>split_iterator()</code> produces three generators, each of which yields one index from the tuples yielded by <code>triple()</code>. Each individual <code>_sub_iterator</code> progresses only once all values from the current tuple have been yielded.

You can't, not without ending up holding all generator output just to be able to produce <code>b</code> values in the second loop. That can get costly in terms of memory. You'd use <code>itertools.tee()</code> to 'duplicate' the generator: <pre class="prettyprint"><code>from itertools import tee def split_gen(gen): gen_a, gen_b = tee(gen, 2) return (a for a, b in gen_a), (b for a, b in gen_b) gen1_split_a, gen1_split_b = split_gen(gen1) for a in gen1_split_a: print a for b in gen1_split_b: print b </code></pre> but what happens in this case is that the <code>tee</code> object will end up having to store everything <code>gen1</code> produces. From the documentation: <blockquote> This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use <code>list()</code> instead of <code>tee()</code>. </blockquote> Following that advice, just put the <code>b</code> values into a list for the second loop: <pre class="prettyprint"><code>b_values = [] for a, b in gen1(): print a b_values.append(a) for b in b_values: print b </code></pre> or better yet, just process both <code>a</code> and <code>b</code> in the one loop.

How to split a Python generator of tuples into 2 separate generators?

Tags:

python

generator

I have a generator that is roughly as follows:

def gen1():
    for x, y in enumerate(xrange(20)):
        a = 5*x
        b = 10*y
        yield a, b

From this generator, I would like to create 2 separate generators as follows:

for a in gen1_split_a():
    yield a

for b in gen1_split_b():
    yield b

What's my play, SA?

652

asked Jan 19 '15 17:01

cavaunpeu

2 Answers

I have a solution that might not exactly be what you want. It separates a n-tuple generator into a tuple of n individual generators. It requires, however, that each individual value of the current tuple has been returned to proceed to the next tuple. Strictly speaking, it "splits" a n-tuple generator into n generators but your example won't work as presented.

It exploits Python's ability to send values back into a generator to influence future yields. The same idea should also be implementable with classes instead but I wanted to get to grips with generators anyway.

When the new generators are initialized, they only know the current n-tuple. Every time they yield the value at their respective index, a callback is performed that informs a higher level generator of this index. Once all indices of the current tuple have been yielded, the higher level generator moves on to the next tuple and the process repeats.

It may be a bit unwieldy, but here is the code (Python 3.6).

from typing import TypeVar, Generator, Tuple, Iterator, Optional

TYPE_A = TypeVar("TYPE_A")


def _next_value(source: Iterator[Tuple[TYPE_A, ...]], size: int) -> Generator[Tuple[TYPE_A, ...], Optional[int], None]:
    checked = [False for _ in range(size)]
    value = next(source)
    while True:
        index = yield value
        if all(checked):
            value = next(source)
            for _i in range(len(checked)):
                checked[_i] = False
        checked[index] = True


def _sub_iterator(index: int, callback: Generator[Tuple[TYPE_A, ...], int, None]) -> Generator[TYPE_A, None, None]:
    while True:
        value = callback.send(index)
        yield value[index]


def split_iterator(source: Iterator[Tuple[TYPE_A, ...]], size: int) -> Tuple[Generator[TYPE_A, Optional[TYPE_A], None], ...]:
    generators = []

    _cb = _next_value(source, size)
    _cb.send(None)

    for _i in range(size):
        each_generator = _sub_iterator(_i, _cb)
        generators.append(each_generator)

    return tuple(generators)


if __name__ == "__main__":
    def triple():
        _i = 0
        while True:
            yield tuple(range(_i, _i + 3))
            _i += 1

    g = triple()
    for i, each_value in enumerate(g):
        if i >= 5:
            break
        print(each_value)

    print()

    g = triple()
    a_gen, b_gen, c_gen = split_iterator(g, 3)
    for i, (a_value, b_value, c_value) in enumerate(zip(a_gen, b_gen, c_gen)):
        if i >= 5:
            break
        print((a_value, b_value, c_value))

triple() is a 3-tuple generator and split_iterator() produces three generators, each of which yields one index from the tuples yielded by triple(). Each individual _sub_iterator progresses only once all values from the current tuple have been yielded.

161

answered Oct 27 '22 01:10

wehnsdaefflae

You can't, not without ending up holding all generator output just to be able to produce b values in the second loop. That can get costly in terms of memory.

You'd use itertools.tee() to 'duplicate' the generator:

from itertools import tee

def split_gen(gen):
    gen_a, gen_b = tee(gen, 2)
    return (a for a, b in gen_a), (b for a, b in gen_b)

gen1_split_a, gen1_split_b = split_gen(gen1)

for a in gen1_split_a:
    print a

for b in gen1_split_b:
    print b

but what happens in this case is that the tee object will end up having to store everything gen1 produces. From the documentation:

This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

Following that advice, just put the b values into a list for the second loop:

b_values = []
for a, b in gen1():
    print a
    b_values.append(a)

for b in b_values:
    print b

or better yet, just process both a and b in the one loop.

answered Oct 27 '22 00:10

Martijn Pieters

Related questions
                            
                                Determine whether super().__new__ will be object.__new__ in Python 3?
                            
                                How to call a Python function from Lua?
                            
                                Django haystack EdgeNgramField given different results than elasticsearch
                            
                                GeoDjango LayerMapping & Foreign Key
                            
                                Scikit-learn χ² (chi-squared) statistic and corresponding contingency table
                            
                                Python: How to check if path is a subpath [duplicate]
                            
                                += with multiple variables in python [closed]
                            
                                Scapy error: no module names pcapy
                            
                                Adding row to pandas DataFrame changes dtype
                            
                                Can't solve Python argparse error 'object has no attribute'
                            
                                python: doctest my github-markdown files?
                            
                                Why does pandas return timestamps instead of datetime objects when calling pd.to_datetime()?
                            
                                Pandas DataFrame cast multiple types to columns
                            
                                Is there a way to change Python's open() default text encoding?
                            
                                R dcast equivalent in python pandas
                            
                                matplotlib scatter plot with different markers and colors
                            
                                How to print all digits of a large number in python?
                            
                                Run mod_wsgi with virtualenv or Python with version different that system default
                            
                                What kind of python magic does dir() perform with __getattr__?
                            
                                How do I block python RuntimeWarning from printing to the terminal? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With