Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Align two lists by adding special values for missing entries

Tags:

python

I have two lists of values of the same sortable type, which are sorted in ascending order, but (i) they don't have the same length, and (ii) entries present in one list can be missing from the other and vice versa. However I know that the majority of values in one list are present in the other, and that there are no duplicates in any list.

So we may have this situation:

list1 = [value1-0, value1-1, value1-2, value1-3]
list2 = [value2-0, value2-1, value2-2]

If it happens that the order of the values from both lists is:

value1-0 < (value1-1 = value2-0) < value2-1 < value1-2 < value1-3 < value2-2

we can give combined sorted value names to the values across the two lists, e.g.:

valueA < valueB < valueC < valueD < valueE < valueF

so that the two lists can be written as:

list1 = [valueA, valueB, valueD, valueE]
list2 = [valueB, valueC, valueF]

Given this I want the lists to become:

new_list1 = [valueA,    valueB, "MISSING", valueD,    valueE,    "MISSING"]
new_list2 = ["MISSING", valueB, valueC,    "MISSING", "MISSING", valueF   ]

Can anyone help?

EDIT: The original question referred to datetime objects in particular (hence the comments specific to datetimes), but has been generalized to any sortable type.

like image 341
Orestis Tsinalis Avatar asked Aug 18 '13 21:08

Orestis Tsinalis


3 Answers

How about something like this:

set1 = set(list1)
set2 = set(list2)
total = sorted(set1|set2)

new_list1 = [x if x in set1 else "MISSING" for x in total]
new_list2 = [x if x in set2 else "MISSING" for x in total]
like image 176
arshajii Avatar answered Nov 20 '22 19:11

arshajii


This problem piqued my interest, so I wrote an overly generic solution.

Here's a function that

  • aligns any number of sequences
  • works on iterators, so it can efficiently handle long (or infinite) sequences
  • supports repeated values
  • is compatible with Python 2 and 3 (although I'd use align_iterables(*inputs, missing_value=None) if I didn't care about historical Python versions)

import itertools

def align_iterables(inputs, missing=None):
    """Align sorted iterables

    Yields tuples with values from the respective `inputs`, placing
    `missing` if the value does not exist in the corresponding
    iterable.

    Example: align_generator('bc', 'bf', '', 'abf') yields:
        (None, None, None, 'a')
        ('b', 'b', None, 'b')
        ('c', None, None, None)
        (None, 'f', None, 'f')
    """
    End = object()
    iterators = [itertools.chain(i, [End]) for i in inputs]
    values = [next(i) for i in iterators]
    while not all(v is End for v in values):
        smallest = min(v for v in values if v is not End)
        yield tuple(v if v == smallest else missing for v in values)
        values = [next(i) if v == smallest else v
                  for i, v in zip(iterators, values)]

# An adapter for this question's problem:

def align_two_lists(list1, list2, missing="MISSING"):
    value = list(zip(*list(align_iterables([list1, list2], missing=missing))))
    if not value:
        return [[], []]
    else:
        a, b = value
        return [list(a), list(b)]

# A set of tests for the question's problem:

if __name__ == '__main__':
    assert align_two_lists('abcef', 'abcdef', '_') == [['a', 'b', 'c', '_', 'e', 'f'], ['a', 'b', 'c', 'd', 'e', 'f']]
    assert align_two_lists('a', 'abcdef', '_') == [['a', '_', '_', '_', '_', '_'], ['a', 'b', 'c', 'd', 'e', 'f']]
    assert align_two_lists('abcdef', 'a', '_') == [['a', 'b', 'c', 'd', 'e', 'f'], ['a', '_', '_', '_', '_', '_']]
    assert align_two_lists('', 'abcdef', '_') == [['_', '_', '_', '_', '_', '_'], ['a', 'b', 'c', 'd', 'e', 'f']]
    assert align_two_lists('abcdef', '', '_') == [['a', 'b', 'c', 'd', 'e', 'f'], ['_', '_', '_', '_', '_', '_']]
    assert align_two_lists('ace', 'abcdef', '_') == [['a', '_', 'c', '_', 'e', '_'], ['a', 'b', 'c', 'd', 'e', 'f']]
    assert align_two_lists('bdf', 'ace', '_') == [['_', 'b', '_', 'd', '_', 'f'], ['a', '_', 'c', '_', 'e', '_']]
    assert align_two_lists('ace', 'bdf', '_') == [['a', '_', 'c', '_', 'e', '_'], ['_', 'b', '_', 'd', '_', 'f']]
    assert align_two_lists('aaacd', 'acd', '_') == [['a', 'a', 'a', 'c', 'd'], ['a', '_', '_', 'c', 'd']]
    assert align_two_lists('acd', 'aaacd', '_') == [['a', '_', '_', 'c', 'd'], ['a', 'a', 'a', 'c', 'd']]
    assert align_two_lists('', '', '_') == [[], []]

    list1 = ["datetimeA", "datetimeB", "datetimeD", "datetimeE"]
    list2 = ["datetimeB", "datetimeC", "datetimeD", "datetimeF"]

    new_list1 = ["datetimeA", "datetimeB", "MISSING", "datetimeD", "datetimeE", "MISSING"]
    new_list2 = ["MISSING", "datetimeB", "datetimeC", "datetimeD", "MISSING", "datetimeF"]

    assert align_two_lists(list1, list2) == [new_list1, new_list2]

# And some extra tests:

    # Also test multiple generators
    for expected, got in zip(
            [(None, None, None, 'a'),
             ('b', 'b', None, 'b'),
             ('c', None, None, None),
             (None, 'f', None, 'f')],
            align_iterables(['bc', 'bf', '', 'abf'])):
        assert expected == got

    assert list(align_iterables([])) == []

    # And an infinite generator
    for expected, got in zip(
            [(0, 0),
             ('X', 1),
             (2, 2),
             ('X', 3),
             (4, 4)],
            align_iterables([itertools.count(step=2), itertools.count()], missing='X')):
        assert expected == got
like image 28
Petr Viktorin Avatar answered Nov 20 '22 19:11

Petr Viktorin


You could try:

new_list1=[]
new_list2=[]

i=j=0
while True:
    print '1 ' + str(new_list1) +' '+str(i)
    print '2 ' + str(new_list2) +' '+str(j)
    if list1[i]==list2[j]:
        new_list1 += [list1[i]]
        new_list2 += [list2[j]]
        i=i+1
        j=j+1
    elif list1[i]>list2[j]:
        new_list1 += ["MISSING"]
        new_list2 += [list2[j]]
        j=j+1
    else: # list1[i]<list2[j]
        new_list1 += [list1[i]]
        new_list2 += ["MISSING"]
        i=i+1
    if i>=len(list1) or j>=len(list2):
        break
while i<len(list1):
    new_list1 += [list1[i]]
    new_list2 += ["MISSING"]
    i=i+1
while j<len(list2):
    new_list1 += ["MISSING"]
    new_list2 += [list2[j]]
    j=j+1

It looks like a lot of code but it should work and loops through the lists just once.

like image 1
Lorenzo Baracchi Avatar answered Nov 20 '22 19:11

Lorenzo Baracchi