Here are two functions that split iterable items to sub-lists. I believe that this type of task is programmed many times. I use them to parse log files that consist of repr
lines like ('result', 'case', 123, 4.56) and ('dump', ..) and so on.
I would like to change these so that they will yield iterators rather than lists. Because the list may grow pretty large, but I may be able to decide to take it or skip it based on first few items. Also, if the iter version is available I would like to nest them, but with these list versions that would waste some memory by duplicating parts.
But deriving multiple generators from an iterable source wan't easy for me, so I ask for help. If possible, I wish to avoid introducing new classes.
Also, if you know a better title for this question, please tell me.
Thank you!
def cleave_by_mark (stream, key_fn, end_with_mark=False):
'''[f f t][t][f f] (true) [f f][t][t f f](false)'''
buf = []
for item in stream:
if key_fn(item):
if end_with_mark: buf.append(item)
if buf: yield buf
buf = []
if end_with_mark: continue
buf.append(item)
if buf: yield buf
def cleave_by_change (stream, key_fn):
'''[1 1 1][2 2][3][2 2 2 2]'''
prev = None
buf = []
for item in stream:
iden = key_fn(item)
if prev is None: prev = iden
if prev != iden:
yield buf
buf = []
prev = iden
buf.append(item)
if buf: yield buf
Thanks to everyone's answer, I could write what I asked for! Of course, as for the "cleave_for_change" function I could also use itertools.groupby
.
def cleave_by_mark (stream, key_fn, end_with_mark=False):
hand = []
def gen ():
key = key_fn(hand[0])
yield hand.pop(0)
while 1:
if end_with_mark and key: break
hand.append(stream.next())
key = key_fn(hand[0])
if (not end_with_mark) and key: break
yield hand.pop(0)
while 1:
# allow StopIteration in the main loop
if not hand: hand.append(stream.next())
yield gen()
for cl in cleave_by_mark (iter((1,0,0,1,1,0)), lambda x:x):
print list(cl), # start with 1
# -> [1, 0, 0] [1] [1, 0]
for cl in cleave_by_mark (iter((0,1,0,0,1,1,0)), lambda x:x):
print list(cl),
# -> [0] [1, 0, 0] [1] [1, 0]
for cl in cleave_by_mark (iter((1,0,0,1,1,0)), lambda x:x, True):
print list(cl), # end with 1
# -> [1] [0, 0, 1] [1] [0]
for cl in cleave_by_mark (iter((0,1,0,0,1,1,0)), lambda x:x, True):
print list(cl),
# -> [0, 1] [0, 0, 1] [1] [0]
/
def cleave_by_change (stream, key_fn):
'''[1 1 1][2 2][3][2 2 2 2]'''
hand = []
def gen ():
headkey = key_fn(hand[0])
yield hand.pop(0)
while 1:
hand.append(stream.next())
key = key_fn(hand[0])
if key != headkey: break
yield hand.pop(0)
while 1:
# allow StopIteration in the main loop
if not hand: hand.append(stream.next())
yield gen()
for cl in cleave_by_change (iter((1,1,1,2,2,2,3,2)), lambda x:x):
print list(cl),
# -> [1, 1, 1] [2, 2, 2] [3] [2]
CAUTION: If anyone's going to use these, be sure to exhaust the generators at every level, as Andrew pointed out. Because otherwise the outer generator-yielding loop will restart right where the inner generator left instead of where the next "block" begins.
stream = itertools.product('abc','1234', 'ABCD')
for a in iters.cleave_by_change(stream, lambda x:x[0]):
for b in iters.cleave_by_change(a, lambda x:x[1]):
print b.next()
for sink in b: pass
for sink in a: pass
('a', '1', 'A')
('b', '1', 'A')
('c', '1', 'A')
If you want to return multiple values from a function, you can use generator functions with yield keywords. The yield expressions return multiple values. They return one value, then wait, save the local state, and resume again.
Unless your generator is infinite, you can iterate through it one time only. Once all values have been evaluated, iteration will stop and the for loop will exit. If you used next() , then instead you'll get an explicit StopIteration exception.
Each time the generator reaches a “yield” statement, it returns the yielded value to the “for” loop, and goes to sleep. With each successive iteration, the generator starts running from where it paused (i.e., just after the most recent “yield” statement)
Instead, they return a special type of iterator, called a Generator. When a value is consumed by calling the generator's next method, the Generator function executes until it encounters the yield keyword. The function can be called as many times as desired, and returns a new Generator each time.
adam's answer is good. this is just in case you're curious how to do it by hand:
def cleave_by_change(stream):
def generator():
head = stream[0]
while stream and stream[0] == head:
yield stream.pop(0)
while stream:
yield generator()
for g in cleave_by_change([1,1,1,2,2,3,2,2,2,2]):
print list(g)
which gives:
[1, 1, 1]
[2, 2]
[3]
[2, 2, 2, 2]
(previous version required a hack or, in python 3, nonlocal
because i assigned to stream
inside generator()
which made (a second variable also called) stream
local to generator()
by default - credit to gnibbler in the comments).
note that this approach is dangerous - if you don't "consume" the generators that are returned then you will get more and more, because stream is not getting any smaller.
For your second function, you can use itertools.groupby
to accomplish this fairly easily.
Here's an alternate implementation that now yields generators instead of lists:
from itertools import groupby
def cleave_by_change2(stream, key_fn):
return (group for key, group in groupby(stream, key_fn))
Here is it in action (with liberal printing along the way, so you can see what's going on):
main_gen = cleave_by_change2([1,1,1,2,2,3,2,2,2,2], lambda x: x)
print main_gen
for sub_gen in main_gen:
print sub_gen
print list(sub_gen)
Which yields:
<generator object <genexpr> at 0x7f17c7727e60>
<itertools._grouper object at 0x7f17c77247d0>
[1, 1, 1]
<itertools._grouper object at 0x7f17c7724850>
[2, 2]
<itertools._grouper object at 0x7f17c77247d0>
[3]
<itertools._grouper object at 0x7f17c7724850>
[2, 2, 2, 2]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With