Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge two sorted iterators without replace

Tags:

python

I need to merge two iterators. I wrote this function:

def merge_no_repeat(iter1, iter2, key=None):
    """
    a = iter([(2, 'a'), (4, 'a'), (6, 'a')])
    b = iter([(1, 'b'), (2, 'b'), (3, 'b'), (4, 'b'), (5, 'b'), (6, 'b'), (7, 'b'), (8, 'b')])
    key = lambda item: item[0]
    fusion_no_repeat(a, b, key) ->
                iter([(1, 'b'), (2, 'a'), (3, 'b'), (4, 'a'), (5, 'b'), (6, 'a'), (7, 'b'), (8, 'b')])
    :param iter1: sorted iterator
    :param iter2: sorted iterator
    :param key: lambda get sorted key, default: lambda x: x
    :return: merged iterator
    """
    if key is None:
        key = lambda x: x
    element1 = next(iter1, None)
    element2 = next(iter2, None)
    while element1 is not None or element2 is not None:
        if element1 is None:
            yield element2
            element2 = next(iter2, None)
        elif element2 is None:
            yield element1
            element1 = next(iter1, None)
        elif key(element1) > key(element2):
            yield element2
            element2 = next(iter2, None)
        elif key(element1) == key(element2):
            yield element1
            element1 = next(iter1, None)
            element2 = next(iter2, None)
        elif key(element1) < key(element2):
            yield element1
            element1 = next(iter1, None)

This function works. But I think it's too complicated. Is it possible to make this function easiest using Python Standard Libraries?

like image 708
Kirill Ermolov Avatar asked Oct 19 '22 06:10

Kirill Ermolov


2 Answers

The pytoolz library has an implementation of this. It doesn't look like it uses any non-standard-library functions so if you really don't want to include an external library you could probably just copy the code.

If you're interested in speed there's also a cython implementation of pytoolz.

like image 188
dshepherd Avatar answered Oct 21 '22 23:10

dshepherd


One, this fails if either of the iterators returns None, you should probably catch StopIteration exceptions. Two, once one of the iterators has no more values, you can just return all the rest of the values of the other one.

I think this is easier to make if you use a small wrapper class around an iterator that makes the next value visible:

class NextValueWrapper(object):
    def __init__(self, iterator):
        self.iterator = iterator
        self.next_value = None
        self.finished = False
        self.get()

    def get(self):
        if self.finished: return  # Shouldn't happen, maybe raise an exception
        value = self.next_value
        try:
            self.next_value = next(self.iterator)
        except StopIteration:
            self.finished = True
        return value

Then the code becomes:

def merge(iter1, iter2, key=None):
    if key is None:
        key = lambda x: x

    wrap1 = NextValueWrapper(iter1)
    wrap2 = NextValueWrapper(iter2)

    while not (wrap1.finished and wrap2.finished):
        if (wrap2.finished or 
               (not wrap1.finished and 
                key(wrap1.next_value) <= key(wrap2.next_value))):
            yield wrap1.get()
        else:
            yield wrap2.get()

This is untested. And it repeats. And it's Python 2, out of habit. Making it non-repeating is left as an exercise for the reader, I hadn't noticed that was a requirement too...

like image 24
RemcoGerlich Avatar answered Oct 21 '22 23:10

RemcoGerlich