Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can yield produce multiple consecutive generators?

Here are two functions that split iterable items to sub-lists. I believe that this type of task is programmed many times. I use them to parse log files that consist of repr lines like ('result', 'case', 123, 4.56) and ('dump', ..) and so on.

I would like to change these so that they will yield iterators rather than lists. Because the list may grow pretty large, but I may be able to decide to take it or skip it based on first few items. Also, if the iter version is available I would like to nest them, but with these list versions that would waste some memory by duplicating parts.

But deriving multiple generators from an iterable source wan't easy for me, so I ask for help. If possible, I wish to avoid introducing new classes.

Also, if you know a better title for this question, please tell me.

Thank you!

def cleave_by_mark (stream, key_fn, end_with_mark=False):
    '''[f f t][t][f f] (true) [f f][t][t f f](false)'''
    buf = []
    for item in stream:
        if key_fn(item):
            if end_with_mark: buf.append(item)
            if buf: yield buf
            buf = []
            if end_with_mark: continue
        buf.append(item)
    if buf: yield buf

def cleave_by_change (stream, key_fn):
    '''[1 1 1][2 2][3][2 2 2 2]'''
    prev = None
    buf = []
    for item in stream:
        iden = key_fn(item)
        if prev is None: prev = iden
        if prev != iden:
            yield buf
            buf = []
            prev = iden
        buf.append(item)
    if buf: yield buf

edit: my own answer

Thanks to everyone's answer, I could write what I asked for! Of course, as for the "cleave_for_change" function I could also use itertools.groupby.

def cleave_by_mark (stream, key_fn, end_with_mark=False):
    hand = []
    def gen ():
        key = key_fn(hand[0])
        yield hand.pop(0)
        while 1:
            if end_with_mark and key: break
            hand.append(stream.next())
            key = key_fn(hand[0])
            if (not end_with_mark) and key: break
            yield hand.pop(0)
    while 1:
        # allow StopIteration in the main loop
        if not hand: hand.append(stream.next())
        yield gen()

for cl in cleave_by_mark (iter((1,0,0,1,1,0)), lambda x:x):
    print list(cl),  # start with 1
# -> [1, 0, 0] [1] [1, 0]
for cl in cleave_by_mark (iter((0,1,0,0,1,1,0)), lambda x:x):
    print list(cl),
# -> [0] [1, 0, 0] [1] [1, 0]
for cl in cleave_by_mark (iter((1,0,0,1,1,0)), lambda x:x, True):
    print list(cl),  # end with 1
# -> [1] [0, 0, 1] [1] [0]
for cl in cleave_by_mark (iter((0,1,0,0,1,1,0)), lambda x:x, True):
    print list(cl),
# -> [0, 1] [0, 0, 1] [1] [0]

/

def cleave_by_change (stream, key_fn):
    '''[1 1 1][2 2][3][2 2 2 2]'''
    hand = []
    def gen ():
        headkey = key_fn(hand[0])
        yield hand.pop(0)
        while 1:
            hand.append(stream.next())
            key = key_fn(hand[0])
            if key != headkey: break
            yield hand.pop(0)
    while 1:
        # allow StopIteration in the main loop
        if not hand: hand.append(stream.next())
        yield gen()

for cl in cleave_by_change (iter((1,1,1,2,2,2,3,2)), lambda x:x):
    print list(cl),
# -> [1, 1, 1] [2, 2, 2] [3] [2]

CAUTION: If anyone's going to use these, be sure to exhaust the generators at every level, as Andrew pointed out. Because otherwise the outer generator-yielding loop will restart right where the inner generator left instead of where the next "block" begins.

stream = itertools.product('abc','1234', 'ABCD')
for a in iters.cleave_by_change(stream, lambda x:x[0]):
    for b in iters.cleave_by_change(a, lambda x:x[1]):
        print b.next()
        for sink in b: pass
    for sink in a: pass

('a', '1', 'A')
('b', '1', 'A')
('c', '1', 'A')
like image 874
h2kyeong Avatar asked May 25 '12 04:05

h2kyeong


People also ask

Can a generator function have multiple yield expressions?

If you want to return multiple values from a function, you can use generator functions with yield keywords. The yield expressions return multiple values. They return one value, then wait, save the local state, and resume again.

How many times Yield statement can be used in generator?

Unless your generator is infinite, you can iterate through it one time only. Once all values have been evaluated, iteration will stop and the for loop will exit. If you used next() , then instead you'll get an explicit StopIteration exception.

What happens when a generator encounters a yield statement?

Each time the generator reaches a “yield” statement, it returns the yielded value to the “for” loop, and goes to sleep. With each successive iteration, the generator starts running from where it paused (i.e., just after the most recent “yield” statement)

Can a generator be called multiple times?

Instead, they return a special type of iterator, called a Generator. When a value is consumed by calling the generator's next method, the Generator function executes until it encounters the yield keyword. The function can be called as many times as desired, and returns a new Generator each time.


2 Answers

adam's answer is good. this is just in case you're curious how to do it by hand:

def cleave_by_change(stream):
    def generator():
        head = stream[0]
        while stream and stream[0] == head:
            yield stream.pop(0)
    while stream:
        yield generator()

for g in cleave_by_change([1,1,1,2,2,3,2,2,2,2]):
    print list(g)

which gives:

[1, 1, 1]
[2, 2]
[3]
[2, 2, 2, 2]

(previous version required a hack or, in python 3, nonlocal because i assigned to stream inside generator() which made (a second variable also called) stream local to generator() by default - credit to gnibbler in the comments).

note that this approach is dangerous - if you don't "consume" the generators that are returned then you will get more and more, because stream is not getting any smaller.

like image 50
andrew cooke Avatar answered Oct 27 '22 18:10

andrew cooke


For your second function, you can use itertools.groupby to accomplish this fairly easily.

Here's an alternate implementation that now yields generators instead of lists:

from itertools import groupby

def cleave_by_change2(stream, key_fn):
    return (group for key, group in groupby(stream, key_fn))

Here is it in action (with liberal printing along the way, so you can see what's going on):

main_gen = cleave_by_change2([1,1,1,2,2,3,2,2,2,2], lambda x: x)

print main_gen

for sub_gen in main_gen:
    print sub_gen
    print list(sub_gen)

Which yields:

<generator object <genexpr> at 0x7f17c7727e60>
<itertools._grouper object at 0x7f17c77247d0>
[1, 1, 1]
<itertools._grouper object at 0x7f17c7724850>
[2, 2]
<itertools._grouper object at 0x7f17c77247d0>
[3]
<itertools._grouper object at 0x7f17c7724850>
[2, 2, 2, 2]
like image 43
Adam Wagner Avatar answered Oct 27 '22 19:10

Adam Wagner