Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pythonic way to merge two overlapping lists, preserving order

Alright, so I have two lists, as such:

  • They can and will have overlapping items, for example, [1, 2, 3, 4, 5], [4, 5, 6, 7].
  • There will not be additional items in the overlap, for example, this will not happen: [1, 2, 3, 4, 5], [3.5, 4, 5, 6, 7]
  • The lists are not necessarily ordered nor unique. [9, 1, 1, 8, 7], [8, 6, 7].

I want to merge the lists such that existing order is preserved, and to merge at the last possible valid position, and such that no data is lost. Additionally, the first list might be huge. My current working code is as such:

master = [1,3,9,8,3,4,5] addition = [3,4,5,7,8]  def merge(master, addition):     n = 1     while n < len(master):         if master[-n:] == addition[:n]:             return master + addition[n:]         n += 1     return master + addition 

What I would like to know is - is there a more efficient way of doing this? It works, but I'm slightly leery of this, because it can run into large runtimes in my application - I'm merging large lists of strings.

EDIT: I'd expect the merge of [1,3,9,8,3,4,5], [3,4,5,7,8] to be: [1,3,9,8,3,4,5,7,8]. For clarity, I've highlighted the overlapping portion.

[9, 1, 1, 8, 7], [8, 6, 7] should merge to [9, 1, 1, 8, 7, 8, 6, 7]

like image 454
Firnagzen Avatar asked May 05 '15 14:05

Firnagzen


People also ask

How do I combine two lists without duplicates?

In Excel, you can merge two lists without duplicating any value by using the Remove Duplicates feature.

How would you go about combining the lists into a single list with all duplicates removed?

Use set() and list() to combine two lists while removing duplicates in the new list and keeping duplicates in original list. Call set(list_1) and set(list_2) to generate sets of the elements in list_1 and list_2 respectively which contain no duplicates.

How do you combine two arrays without duplicate values in python?

Given two arrays, now our task is to merge or combine these arrays into a single array without duplicate values. So we can do this task using the Union() method. This method combines two arrays by removing duplicated elements in both arrays.


2 Answers

You can try the following:

>>> a = [1, 3, 9, 8, 3, 4, 5] >>> b = [3, 4, 5, 7, 8]  >>> matches = (i for i in xrange(len(b), 0, -1) if b[:i] == a[-i:]) >>> i = next(matches, 0) >>> a + b[i:] [1, 3, 9, 8, 3, 4, 5, 7, 8] 

The idea is we check the first i elements of b (b[:i]) with the last i elements of a (a[-i:]). We take i in decreasing order, starting from the length of b until 1 (xrange(len(b), 0, -1)) because we want to match as much as possible. We take the first such i by using next and if we don't find it we use the zero value (next(..., 0)). From the moment we found the i, we add to a the elements of b from index i.

like image 127
JuniorCompressor Avatar answered Sep 28 '22 17:09

JuniorCompressor


There are a couple of easy optimizations that are possible.

  1. You don't need to start at master[1], since the longest overlap starts at master[-len(addition)]

  2. If you add a call to list.index you can avoid creating sub-lists and comparing lists for each index:

This approach keeps the code pretty understandable too (and easier to optimize by using cython or pypy):

master = [1,3,9,8,3,4,5] addition = [3,4,5,7,8]  def merge(master, addition):     first = addition[0]     n = max(len(master) - len(addition), 1)  # (1)     while 1:         try:             n = master.index(first, n)       # (2)         except ValueError:             return master + addition          if master[-n:] == addition[:n]:             return master + addition[n:]         n += 1 
like image 24
thebjorn Avatar answered Sep 28 '22 18:09

thebjorn