Alright, so I have two lists, as such:
[1, 2, 3, 4, 5]
, [4, 5, 6, 7]
.[1, 2, 3, 4, 5]
, [3.5, 4, 5, 6, 7]
[9, 1, 1, 8, 7]
, [8, 6, 7]
.I want to merge the lists such that existing order is preserved, and to merge at the last possible valid position, and such that no data is lost. Additionally, the first list might be huge. My current working code is as such:
master = [1,3,9,8,3,4,5] addition = [3,4,5,7,8] def merge(master, addition): n = 1 while n < len(master): if master[-n:] == addition[:n]: return master + addition[n:] n += 1 return master + addition
What I would like to know is - is there a more efficient way of doing this? It works, but I'm slightly leery of this, because it can run into large runtimes in my application - I'm merging large lists of strings.
EDIT: I'd expect the merge of [1,3,9,8,3,4,5], [3,4,5,7,8] to be: [1,3,9,8,3,4,5,7,8]. For clarity, I've highlighted the overlapping portion.
[9, 1, 1, 8, 7], [8, 6, 7] should merge to [9, 1, 1, 8, 7, 8, 6, 7]
In Excel, you can merge two lists without duplicating any value by using the Remove Duplicates feature.
Use set() and list() to combine two lists while removing duplicates in the new list and keeping duplicates in original list. Call set(list_1) and set(list_2) to generate sets of the elements in list_1 and list_2 respectively which contain no duplicates.
Given two arrays, now our task is to merge or combine these arrays into a single array without duplicate values. So we can do this task using the Union() method. This method combines two arrays by removing duplicated elements in both arrays.
You can try the following:
>>> a = [1, 3, 9, 8, 3, 4, 5] >>> b = [3, 4, 5, 7, 8] >>> matches = (i for i in xrange(len(b), 0, -1) if b[:i] == a[-i:]) >>> i = next(matches, 0) >>> a + b[i:] [1, 3, 9, 8, 3, 4, 5, 7, 8]
The idea is we check the first i
elements of b
(b[:i]
) with the last i
elements of a
(a[-i:]
). We take i
in decreasing order, starting from the length of b
until 1 (xrange(len(b), 0, -1)
) because we want to match as much as possible. We take the first such i
by using next
and if we don't find it we use the zero value (next(..., 0)
). From the moment we found the i
, we add to a
the elements of b
from index i
.
There are a couple of easy optimizations that are possible.
You don't need to start at master[1], since the longest overlap starts at master[-len(addition)]
If you add a call to list.index
you can avoid creating sub-lists and comparing lists for each index:
This approach keeps the code pretty understandable too (and easier to optimize by using cython or pypy):
master = [1,3,9,8,3,4,5] addition = [3,4,5,7,8] def merge(master, addition): first = addition[0] n = max(len(master) - len(addition), 1) # (1) while 1: try: n = master.index(first, n) # (2) except ValueError: return master + addition if master[-n:] == addition[:n]: return master + addition[n:] n += 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With