I need to merge two iterators. I wrote this function:
def merge_no_repeat(iter1, iter2, key=None):
"""
a = iter([(2, 'a'), (4, 'a'), (6, 'a')])
b = iter([(1, 'b'), (2, 'b'), (3, 'b'), (4, 'b'), (5, 'b'), (6, 'b'), (7, 'b'), (8, 'b')])
key = lambda item: item[0]
fusion_no_repeat(a, b, key) ->
iter([(1, 'b'), (2, 'a'), (3, 'b'), (4, 'a'), (5, 'b'), (6, 'a'), (7, 'b'), (8, 'b')])
:param iter1: sorted iterator
:param iter2: sorted iterator
:param key: lambda get sorted key, default: lambda x: x
:return: merged iterator
"""
if key is None:
key = lambda x: x
element1 = next(iter1, None)
element2 = next(iter2, None)
while element1 is not None or element2 is not None:
if element1 is None:
yield element2
element2 = next(iter2, None)
elif element2 is None:
yield element1
element1 = next(iter1, None)
elif key(element1) > key(element2):
yield element2
element2 = next(iter2, None)
elif key(element1) == key(element2):
yield element1
element1 = next(iter1, None)
element2 = next(iter2, None)
elif key(element1) < key(element2):
yield element1
element1 = next(iter1, None)
This function works. But I think it's too complicated. Is it possible to make this function easiest using Python Standard Libraries?
The pytoolz library has an implementation of this. It doesn't look like it uses any non-standard-library functions so if you really don't want to include an external library you could probably just copy the code.
If you're interested in speed there's also a cython implementation of pytoolz.
One, this fails if either of the iterators returns None, you should probably catch StopIteration exceptions. Two, once one of the iterators has no more values, you can just return all the rest of the values of the other one.
I think this is easier to make if you use a small wrapper class around an iterator that makes the next value visible:
class NextValueWrapper(object):
def __init__(self, iterator):
self.iterator = iterator
self.next_value = None
self.finished = False
self.get()
def get(self):
if self.finished: return # Shouldn't happen, maybe raise an exception
value = self.next_value
try:
self.next_value = next(self.iterator)
except StopIteration:
self.finished = True
return value
Then the code becomes:
def merge(iter1, iter2, key=None):
if key is None:
key = lambda x: x
wrap1 = NextValueWrapper(iter1)
wrap2 = NextValueWrapper(iter2)
while not (wrap1.finished and wrap2.finished):
if (wrap2.finished or
(not wrap1.finished and
key(wrap1.next_value) <= key(wrap2.next_value))):
yield wrap1.get()
else:
yield wrap2.get()
This is untested. And it repeats. And it's Python 2, out of habit. Making it non-repeating is left as an exercise for the reader, I hadn't noticed that was a requirement too...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With